Verified or Certified Robustness? Why don't we have both?

Verified Certified Robustness for Neural Networks - Part I

This post is the first in a series on the topic of Verified Certified Robustness for Neural Networks. These posts accompany and explain our recent paper A Formally Verified Robustness Certifier for Neural Networks that will appear at CAV 2025, and the broader research agenda that that paper initiates.

I’ll link to the other posts in this series here, as they are published.

This first post gives a high-level overview. It explains our contribution as a new way of achieving formally verified robustness guarantees for neural networks that (perhaps surprisingly) avoids the scalability bottleneck of performing symbolic reasoning over neural networks.

Problems

All research is motivated by (often only perceived) problems. Let’s begin, then, with an airing of grievances.

The Problem with Neural Networks

As a machine learning outsider, I think of neural networks as the most surprising software ever written.

I feel privileged to live during the rise of this technology and to be continually surprised by what it can do.

But surprise is a double-edged sword.

What makes machine learning so powerful also renders this same technology unpredictable. Yet being able to predict with accuracy how software will behave is crucial for minimising its potential harms.

Formal methods are a powerful tool to reason about software, and to help rule out unwanted behaviours before software is deployed. But, as I’ve said before, machine learning seems somewhat averse to many kinds of formal methods. In particular, it seems inherently difficult to write functional correctness specifications for machine learning models, like neural networks.

For this reason, much of the research on formal methods for machine learning focuses on reasoning about non-functional properties.

The Problem of Non-Robustness

A prominent non-functional property for machine learning models is that of robustness, motivated by the existence of so-called adversarial examples.

A machine learning model is not robust when it gives meaningfully different answers for very similar inputs. The classic example is the following from Goodfellow et al..

The classic ImageNet adversarial example

This figure shows three images. The leftmost panda image is correctly classified as a panda, by an ImageNet classifier. A small amount of noise (the middle image) is added to this image, producing the rightmost panda image. Unfortunately, the classifier incorrectly classifies the rightmost image as a gibbon, despite it being visually almost indistinguishable from the original image. We call the rightmost panda image an adversarial example.

Adversarial examples are relatively common for models trained (only) to achieve high accuracy. When a model is robust for a particular input, we mean that it is free of adversarial examples that are very similar to that input but would be classified differently.

Aside: Robustness, Formally

Robustness is relatively compact to define formally. If you’re not interested in the formal details, you can skip this part. The point is to make clear that when we say that a model is robust, we are talking about a very well-defined property that is amendable to being stated formally and thereby also formally proved.

Let’s call the leftmost panda v and the rightmost panda u. Then, while v and u are very similar to each other, the classifier classifies them differently.

What would it mean for a model to be invulnerable to adversarial examples? Robustness provides a formal definition that captures this idea. Roughly speaking it says that if we take a model’s input v and perturb (i.e., modify) it a little bit, producing new input u, then the model gives the same answers for v and u.

This requires some way to measure when u is only a little bit different to v. Robustness does so via some kind of distance measure, typically some kind of p-norm such as Euclidean distance. We say that u is sufficiently close to v when the distance between them does not exceed some quantitative bound, typically written ε.

Writing ||v|| for the l2-norm of vector v (so that ||v-u|| is the Euclidean distance between the points v and u), we say that a classifier N is locally robust for input v when:

$$\forall\textbf{u}:||\textbf{v}-\textbf{u}||\le\varepsilon\implies\mathit{ArgMax}(N(\textbf{v}))=\mathit{ArgMax}(N(\textbf{u}))$$

This definition assumes a neural network classifier N that outputs vectors where the index of the output vector’s maximum component corresponds to the predicted class for the given input. ImageNet models classify images into 1000 different classes. Therefore, for the definition above, imagine the output vectors of classifier N contain 1000 different components. For the left image v, the index of the maximum component of N(v) corresponds to the “panda” class. For the output N(u) produced by the right image u, its maximum component is at the index corresponding to the “gibbon” class.

For an output vector N(v), we use ArgMax(N(v)) to give us this index of its maximum component. So the definition above says that perturbing v to produce vector u shouldn’t cause the model N to classify u differently to v (when that perturbation is bounded by ε).

The Problem with Verified Robustness

Relatively speaking, there has been a lot of work done by formal methods people on how to formally verify robustness for neural networks. (Besides smart contract verification, I think of neural network verification as the “other” hot topic in formal methods research over the past decade.)

Neural networks are programs, after all, so we should expect at least some traditional program reasoning tools to be applicable to them.

However, they are certainly not structured programs. That means that traditional reasoning methods like program logics are not directly applicable.

Instead, much research has sought to develop ways to take ideas from automated verification and apply them to reason about neural networks. This includes developing special purpose SMT solvers, or new ways to apply abstract interpretation to neural networks.

To my way of thinking, what much of these lines of work have in common is the idea of performing some kind of symbolic reasoning over the neural network itself, to derive conclusions about it.

Doing so might seem quite reasonable at first glance. Formal methods are all about applying symbolic reasoning to software and in this case the software we want to reason about are neural networks. So why not try to apply symbolic reasoning directly to them?

One answer is scalability. To put it mildly, useful neural networks tend to be big. But symbolic reasoning methods don’t tend to scale easily.

This can be seen by looking at a relatively recent method for verifying robustness of neural networks. When applied to (what some people call) the “Hello, World!” of machine learning, namely neural networks for solving the MNIST handwritten digits classification problem, this method reports a median runtime of 7 seconds to verify robustness for individual points v, when applied to a model with just 9 20-neuron hidden layers.

Unfortunately, performance bottlenecks are a challenge for approaches that rely on performing symbolic reasoning directly over neural networks to prove robustness for individual points v. This is because the complexity of the symbolic reasoning tends to scale with the size of the neural network being reasoned about.

The Problem with Certified Robustness

The discussion above leads naturally to the question of how can we check the robustness of (individual inputs or outputs to) a neural network, without having to perform symbolic reasoning over the neural network itself.

Fortunately, very clever people within the machine learning community have developed very clever methods for doing precisely this.

The line of work on Globally-Robust Neural Networks pioneered by our CATCH MURI collaborator is a good example. I’ll refer below to this paper as “Leino et al.”.

Here, the authors show how one can train a neural network to try to maximise its robustness. To do so, they figured out an efficient way to compute a measure of a neural network’s robustness. From that measure, they develop an efficient procedure for checking whether a model’s output is robust. One trains a neural network to maximise its robustness by incorporating into the training objective that we should maximise the number of training data points that the checking procedure says are robust.

This checking procedure is a robustness certifier.

Unfortunately, while this method is efficient, it is also unsound.

In our work, we discovered multiple soundness issues in the design and implementation of Leino et al.’s robustness certifier. I’ll discuss those in more detail in subsequent posts. However, to summarise, we found three exploitable soundness issues, as well as design problems that made their approach impossible to prove sound.

The exploitable soundness issues included:

  1. An exploitable bug in Leino et al.’s power iteration implementation. An adversary who is able to choose a model’s initial weights can cause their certifier to incorrectly classify non-robust points as robust, even after robust model training.

  2. The use of floating-point arithmetic in their implementation leads to problems similar in nature to the issue above with their power iteration implementation.

Both issues could be exploited, for example, by an adversary who posts a model online that purports to be accurate while enjoying a certain level of robustness. Anyone who attempts to use that model in conjunction with Leino et al.’s certifier can be mislead into believing the model really is as robust as it purports to be, when in fact its true robustness can be dramatically lower. We have shown that these issues are exploitable on real world models, such as those for MNIST.

  1. We discovered a bug in Leino et al.’s certification routine which allows an adversary with white-box access to a model to produce inputs that Leino et al.’s certifier will incorrectly certify as robust. We have confirmed this issue is exploitable for the kinds of models we evaluate in our paper (MNIST, Fashion MNIST, and CIFAR-10).

We note that Leino et al.’s implementation (including both their training and certifier implementation) is a mere 1714 lines of Python.

It seems, therefore, that this kind of code is tricky to get right, and that it is all too easy to introduce subtle bugs with exploitable consequences. We certainly cast no aspersions on the quality of Leino et al.’s work which, as I implied above, I hold in incredibly high regard.

Our Solution

This brings us, then, to our solution, and the title of this post: verified certified robustness.

Specifically, we designed, implemented and formally verified a robustness certifier for neural networks.

The verification gives us confidence that when our certifier says an output is robust, it really is robust. Our certifier is free of the kinds of soundness issues mentioned above.

What is really exciting about our approach, however, is that the complexity of symbolic reasoning no longer scales with the size of the neural network. In our approach, the symbolic reasoning happens when we verify our certifier. Put another way, we are doing symbolic reasoning over our robustness certifier. No symbolic reasoning is ever done over any particular neural network.

Instead, the verification of our certifier requires symbolic reasoning (specifically, auto-active theorem proving) over formal specifications and lemmas that talk about a general formalisation of neural networks, and what it means for them to be robust, and (therefore) why our certifier’s answers are sound.

Therefore, at least in principle, our approach has the potential to overcome the key scalability challenge that has haunted much neural network robustness verification research over the past decade.

I said above that other verification approaches require seconds to certify the robustness of individual model outputs, for some MNIST models. With our approach, model output certification time is independent of the size of the neural network, and requires at most milliseconds for each output.

What Next?

In future posts, I’ll explain the soundness issues we found in Leino et al.’s certifier, and how they can be exploited. I’m also intending to discuss how we carried out our verification, the limitations of our current implementation, and some key research future research challenges.

Stay tuned, and thanks for reading.

Acknowledgements

Thanks to James Tobler, who is lead author of the paper that is the subject of this series, for insightful feedback on earlier drafts of this post.