The Formal Verification Triangle

This post is a quick introduction to a concept that I have been using in talks since 2015, to help explain the landscape of formal verification methods and the inherent trade-offs between them.

I originally came up with it as a way to contextualise my research to folks outside the formal methods community. However, it may have value beyond that.

Formal methods, and the methods for verifying softwarea in particular, are myriad. The landscape can be confusing. We have model checking, abstract interpretation, symbolic execution, auto-active verification, static analyses (including type systems), as well as interactive proof.

Making sense of this zoo of techniques is certainly something I struggled with when I was first getting into formal methods.

The formal verification triangle is one way of trying to understand who’s who in the zoo.

It also captures a fundamental insight about formal verification methods that I believe is universally understood by practitioners yet rarely stated explicitly.

Desirable Properties of Verification Methods

There are many desirable properties that you might wish for your favourite verification method. Here are just three:

Scalable: ideally, verification methods should be able to reason about large pieces of code. What “large” means is somewhat subjective, of course. I’ll try to explain it further by example below.
Automatic: ideally, verification methods should require minimal human input and guidance, since human labour is expensive in comparison to compute.
Precise: ideally, verification methods should be able to reason with precision about code. This includes proving interesting properties, such as functional correctness or non-trivial security properties.

A Fundamental Trade-Off

There is a fundamental tension or trade-off between these three desirable properties. Indeed, the tension between them is evident in various writings of formal methods luminaries, including those of Hoare, Cousot & Cousot, Clarke, O’Hearn, and many besides.

In fact, these three desirable properties seem to form a classic trilemma: no verification method enjoys all three fully at once.

We can illustrate this trilemma trade-off graphically, inspired by Zooko’s Triangle (though the use of a triangular device to illustrate trilemmas predates Zooko).

The edges of the triangle below represent the three desirable properties. At each vertex of the triangle we can find traditional verification methods.

The formal verification triangle

For example, model checking and symbolic execution, like static analysis and abstract interpretation, are automatic. However, in contrast to the latter, the former are much more precise at the expense of scalability. Put another way, methods like abstract interpretation are specifically designed to scale further, and do so by sacrificing precision. Methods like interactive and auto-active proof sacrifice automation, but gain scalability without sacrificing precision.

Concrete Examples

Some examples might help to explain further.

Scalable and Precise

In general, proving functional correctness of non-trivial code requires human effort. This is almost a tautology, since a human is required to write the precise specification against which to prove correctness. But also (very likely) to come up with the appropriate invariants and intermediate lemmas needed to justify why the software in question is correct.

This is why flagship functional correctness verification projects like seL4 and CompCert used interactive proof.

Auto-active methods like Dafny and SPARK Ada’s Prover try to minimise the amount of manual work the human has to do. However, the human must still write lemmas and formal specifications, even if they can appeal to automated provers like SMT solvers to discharge low-level proof details.

Automatic and Precise

Symbolic execution tools like KLEE and BINSEC can be used to carry out precise reasoning, including about complex assertions. Yet that automation comes at the expense of scalability. One cannot apply BINSEC’s symbolic executor, for instance, to prove the correctness of seL4 (even if one could specify its correctness as assertions that BINSEC could prove). Symbolic execution is known to suffer scalability limitations due in particular to the path explosion problem.

Software model checkers face similar scalability challenges, albeit from state explosion rather than path explosion, while still facilitating relatively precise reasoning. The CBMC bounded model checker for C code is a good example of the scalability-precision trade-off inherent with automated methods. It supports only bounded state-space exploration (an artificial scalability cap) in order to retain precision.

Automatic and Scalable

One way to understand abstract interpretation is as a method for automatically reasoning about code while retaining scalability by building abstractions of program states and behaviours, thereby necessarily sacrificing precision.

Static type systems go even further, by being even more coarse-grained than abstract interpretation (which in turn means their analyses are also more efficient).

Yet neither can, in general, prove precise properties like functional correctness (even if abstract interpretation has been widely applied to many programs and problems)

Conclusion

I’m not aware of the trilemma that this post identifies having been noted as such anywhere in the literature. Please reach out if you know more.

I’d wager there are other (perhaps even more illuminating) trade-offs that could help further make sense of the space of verification methods.

I should also point out that this trilemma is merely a conjecture. I would love to hear from you especially if you disagree.

Acknowledgements

Thanks to Peter O’Hearn and Carroll Morgan, who provided valuable comments on an earlier draft of this post.