A Formal Security Verification Challenge

Secure Wordle

If you’ve read my recent posts, you’ll know I’ve been thinking a fair bit about Wordle. You might also have picked up that I recently wrote and formally verified the core of a tiny Wordle implementation as being correct and secure, using the logic Security Concurrent Separation Logic (SecCSL) and its verifier, both largely developed by Gidon Ernst. Correctness means that when the player submits a guess, Wordle computes the right answer to return to them. Security means roughly that it leaks no more information than it should, which I’ll make precise below.

While carrying out my verification I was trying to be as stringent as possible, imagining what one would want from a perfect Wordle implementation. As it turns out, what it means for a perfect Wordle implementation to be secure is, in comparison to typical information flow security policies, somewhat subtle. It was certainly one of the most interesting security specifications I’ve written since we verified seL4 to be information-flow secure.

This post is an open challenge for others who work on tools that aim to be able to prove or enforce secure information flow to have a go at trying to produce a secure Wordle implementation that is as close to perfect as possible. I am hoping that doing so will shed light on the complementary strengths and weaknesses of different verification and information flow enforcement approaches, relative to each other.

Wordle Correctness

Wordle’s correctness is pretty straightforward to specify rigorously. Here is a natural language specification for the correctness of the core of my verified Wordle implementation.

Correctness condition: The program is given two strings word and guess of equal length. The program must output only a string out of the same length where each character oi of out is either 0 (grey), 1 (yellow), or 2 (green), according to the rules of Wordle:

  • green indicates that the character gi of guess at position i exactly matches the character wi of word at that same position;
  • yellow indicates that the two characters do not match but gi does appear in word in some other position j for which gj is not equal to wj;
  • grey indicates that gi appears nowhere in word.

Wordle’s Security

Assuming I understand Wordle properly and didn’t make a mistake writing it down, any correct Wordle implementation should satisfy the above correctness specification. What should we demand from a perfect Wordle implementation in terms of security?

I argue we should demand two things. Firstly that the output string out is the only information that the player gets to see. Of course the player interacting directly with the program gets to see more than just the output string out. For example, they might time how long the program takes to produce its output. Or, they might run a second program concurrently to observe the Wordle program’s memory access pattern. What you consider observable to the player, beyond the output string out, is determined by your threat model. However for my verification I considered both the timing behaviour and the memory-access pattern observable (because the logic I was using to perform the verification reasons to some degree about both of these). Therefore, we have the first security condition:

Security condition 1: The behaviour of the Wordle implementation observable to the player should not leak any more information than that contained in the output string out, according to the correctness specification for what out is supposed to contain.

Notice something interesting about this security condition, how it mentions “according to the correctness specification …”. This means that for Wordle, its security inherently depends on its correctness. Therefore we should expect that to prove it secure we might first need to prove it correct, because when we prove it secure we may need to rely on the fact that it is already correct. This is a common occurrence when reasoning about the security of non-trivial programs, and is why it can be helpful to carry out security proofs in a logic or tool that itself can carry out correctness reasoning, or one that can at least take advantage of such reasoning.

Is this security condition enough for a perfect Wordle implementation? I argue “no”. What about somebody who doesn’t get to see the output string out? What should a perfect Wordle implementation reveal to them? I would argue it should reveal nothing at all.

Modern operating systems are not very good at isolating timing and memory-access effects (e.g. as observable through the cache) of one program from another. Therefore we should expect that somebody who does not get to see the output string out might nonetheless be able to observe the Wordle program’s timing and memory access pattern.

But even if operating systems were better, one might argue that a perfect program is one whose effects are as minimal as they need to be for it to implement its function. Wordle doesn’t need to leak information through its timing and memory-access pattern. So it shouldn’t. Either way, this leads to the second, stronger, security condition:

Security condition 2: The behaviour of the Wordle implementation observable to anyone who does not get to see the output string out should not leak any information about the input strings word and guess, besides their lengths which are assumed to be public knowledge. Neither should it leak information about the contents of the output string out.

This condition would be useful, for instance, if the Wordle implementation were being used as part of a multiplayer game in which players each race to guess the chosen word of the day. With this condition we could be sure that a player, Bob, gets to learn none of the information that Wordle reveals to player Alice when Alice makes her guesses.

Security condition 2 essentially requires that a perfect Wordle implementation does not leak information via side channels.

Formally Specifying Wordle’s Correctness and Security

Below I’ll give a flavour of the specifications I used in my own verification. This might give you some ideas about how to get started on this challenge with your preferred tool, method or logic. To keep things neutral I’ll use a made-up specification language whose syntax improves on that implemented by our verification tool.

Correctness

Specifying correctness is relatively straightforward. The core correctness condition is captured by the correct_output predicate over lists. The lists word and guess are both lists of characters (strings), while out is a list of items whose values are either GREEN, YELLOW, or GREY. The ! notation is used to access a list at a specific location: xs ! k means the kth item of the list xs, indexed from 0. The notation x:xs refers to a list whose head is x and whose tail is xs; && is boolean and (conjunction); || is boolean or (disjunction); ==> is logical implication.

 1-- determine whether `g` appears in `word` in some location `j` for which
 2-- word[j] != guess[j]
 3contains(  word,  guess, c, 0) = false
 4contains((w:ws), (g:gs), c, n) =
 5  (w != g && w = c) || contains(ws,gs,c,n-1) where n > 0
 6
 7-- output string 'out' is correct wrt 'word' and 'guess', both of length n
 8correct_output(word, guess, out, n) =
 9  (length(out) = n &&
10     (i. 0 <= i <= n ==>
11       (out ! i) = (if (guess ! i) = (word ! i) then GREEN
12                     else if contains(word, guess, guess ! i, n) then YELLOW
13                     else GREY)))

I found it easier in my own verification to write the contains predicate as something executable. However it can of course also be written using existential quantification instead.

To specify the correctness of my Wordle implementation I annotated it with a precondition and a postcondition. As in all Hoare style verification, the precondition encodes conditions assumed to hold in the state from which the program begins executing. The postcondition asserts what must be true when the program finishes.

The precondition1 of my Wordle specification assumes that word and guess are strings of equal length n, for some n >= 0. The postcondition of my Wordle specification says that out points to a string of length n for which correct_output(word, guess, out, n) holds.

Specifying Security

Specifying security I found far more interesting than correctness.

I used a two-step process. I began by first specifying the weaker Security condition 1. I then strengthened that specification to include the stronger Security condition 2.

Specifying Security Condition 1

Specifying Security condition 1 essentially boils down to specifying what information should be observable to the player (whether that information is revealed to the player in the output string out or in some other behaviour of the program like its timing). Using typical information flow security parlance, let us call the information allowed to be observed by the player low, and all other information high.

For Wordle, the following information is allowed to be made available to the player. This is precisely the information encoded in the output string out (when it is correctly computed):

  • The lengths of the strings word and guess are assumed to be known to the player and are therefore both considered low;
  • The player is allowed to learn whether each character in the guess is equal to the corresponding character in the word, i.e. for each position i, the value of the equality test gi = wi is low;
  • For characters in the guess not equal to the corresponding character in the word, the player is allowed to learn whether that character appears elsewhere in word in some location for which the character in guess does not match the corresponding character in the word, i.e. for each position i, contains(word, guess, gi) is low when gi ≠ wi, and is high otherwise.

The final bullet point is an interesting example of value-dependent classification, an increasingly common flavour of information flow specification wherein one uses the value of one datum to determine whether a second datum should be considered high or low.

In SecCSL I encoded the bulk of this specification as a predicate, release_policy over word and guess, as follows. Note this predicate does not include the first bullet, which would be encoded by saying e.g. length(word) :: low. (I encoded that as a simple precondition in my specification and so it did not need to be included in this predicate.)

1release_policy(word, guess) =
2  (i. 0 <= i <= length(word) && i :: low ==>
3       ((guess ! i) = (word ! i)) :: low &&
4       contains(word, guess, guess ! i, length(word)) ::
5         ((word ! i) != (guess ! i) ? low : high))

The second bullet is encoded on line 3. The third on lines 4 and 5. In SecCSL, v :: l means that value v has sensitivity (aka classification) l (e.g. low or high). We also use the C-style conditional expression c ? a : b as shorthand for if c then a else b.

To specify the security of the Wordle implementation using this machinery I then enhanced the precondition of my program to assume that the length n of word and guess is low, and that release_policy(word, guess) holds initially. This precisely encodes what information we assume is allowed to be made visible to the player (i.e. be made public). The postcondition then asserts that the output string out must be low, in addition to satisfying correct_output.

Specifying Security Condition 2

What distinguishes Security condition 1 from Security condition 2? Essentially Security condition 1 treats all information contained in out as public (i.e. low), and so allows it to be revealed not only to the player but to anybody else. Security condition 2 strengthens this by requiring that this information should be made visible only to the player, and should not be treated as public.

To encode this requirement we have to forego the luxury of considering just two kinds of data: high data, which should be kept secret, and low data, which is allowed to be made public. Instead we have data that should be kept secret from everyone including the player (e.g. the raw string word); data allowed to be revealed only to the player (e.g. the raw string out, when correctly computed); and data allowed to be revealed to anyone (e.g. the length of the strings word, guess and out).

We can encode this in SecCSL by using a third security level in addition to low and high. Let us call this third security level plev (abbreviating “the player’s security level”). Then we can encode the stronger security condition simply by making use of this new level to strengthen the definition of release_policy above.

1release_policy(word, guess, plev) =
2  (i. 0 <= i <= length(word) && i :: low ==>
3       ((guess ! i) = (word ! i)) :: plev &&
4       contains(word, guess, guess ! i, length(word)) ::
5         ((word ! i) != (guess ! i) ? plev : high))

This is an even more interesting example of value-dependent classification than the previous version because now the value (word ! i) != (guess ! i) on which the classification (i.e. security level) of contains(word, guess, guess ! i, length(word)) depends is no longer low. Instead that value has classification plev.2 This style of specification is sound because the lowest classification that contains(word, guess, guess ! i, length(word)) can have is plev, so the fact that its classification depends on a non-public value (also classified at plev) does not leak additional information.

I am aware of no other logics for proving information flow security that directly support this idiom (although there surely exist relational logics in which it might be encoded indirectly, I suspect). I would love to be enlightened here!

The precondition and postcondition using this new predicate remain unchanged except that the sensitivity of the output string out is now asserted to be plev in the postcondition, not low.

Conclusion

Wordle turned out to be a lot more fun to verify than you might imagine, especially if you have a penchant for writing the strongest specifications you can think of. I would love to hear about others’ experiences in specifying and verifying the correctness and security (or whatever other properties you think are interesting) of this simple program. I hope you’ll get just as much fun and insight out of it as I did.

Notes

[1] The logic in which I carried out the verification, Security Concurrent Separation Logic (SecCSL), mixes ideas from Concurrent Separation Logic and prior logics for proving information flow security. My implementation was written in C, in which the strings word and guess are passed as pointers to (non-NUL terminated) arrays of characters. Therefore its specification must talk about how the pointers point to arrays of characters represented by abstract lists. This style of specification is common when using separation logic; however, to ease exposition let us just pretend that the pointer to the concrete string and the abstract list that represents it are one and the same.

[2] A fun fact was that while I was carrying out my verification, I had to extend the semantics of SecCSL to write this specification. Under SecCSL’s original semantics writing v :: l automatically implied that the expression l was low. But in the specification above we certainly do not want it to imply that the expression ((word ! i) != (guess ! i) ? plev : high) is low: doing so would then imply that (word ! i) != (guess ! i) is also low, which is precisely what we are trying to avoid. It turned out that there was a subtle limitation in SecCSL that had lurked in its original design for the past three years. Removing that limitation required making a small but crucial change to the semantics of what v :: l means. Luckily I was able to easily re-establish the soundness of the modified logic, thanks to its mechanised soundness proofs in Isabelle/HOL. Changing our program verifier which automates reasoning in the logic was also incredibly straightforward and revealed no impact on the barrage of regression tests and case studies already verified.

This was an interesting example of needing to make a change in a program logic and then needing two kinds of evidence to support that change: (1) knowing that the logic would continue to be sound, and (2) knowing that the change would not impact on existing verifications. While (1) is logical evidence, (2) is inherently empirical, yet both were absolutely crucial. That both were machine-checked and largely automated was also crucial for allowing me to get precise and quick feedback on whether to proceed with this change.

The lesson: mechanise the soundness proofs of your logics; build a large corpus of regression tests and case studies mechanised in your logic.