A Parable on Software Perfection

Programmers love to set each other challenges, particularly ones that involve writing small programs whose jobs are seemingly simple but turn out to be quite difficult to write without making any mistakes. Such challenges have become a staple of job interviews at large and small tech companies alike. Here’s a parable about one such challenge.

Brogrammer Bob is interviewing at a company who writes high-assurance software. His interviewer, Careful Carrie, asks him “Do you believe you can write perfect software?”. Bob considers Carrie’s question and, after some thought, he replies.

“For all intents and purposes”, he says, “almost all software is imperfect. Certainly, we can expect no system that employs software to be perfect—even if the software itself were somehow bug free, the hardware on which it runs might fail, causing the system to fail. Striving for perfection, therefore, seems like a poor trade-off.”.

“Good”, says Carrie, because she agrees with Bob. “But you didn’t answer my question. Do you believe you can write perfect software?”

Bob knows Carrie has set him a trap, but he believes he can avoid it without having to concede that he could never write perfect software.

“It depends”, he says. “If the software and its requirements are simple enough, I believe I could do it.”

“How simple?” asks Carrie. “Say, for instance, you had to write the core logic of the game Wordle.”

Wordle?” replies Bob, nonplussed.

“You know”, says Carrie. “You are given two strings w and g of equal length. Your program must output only a string o of the same length where each character oi of o is either 0 (grey), 1 (yellow), or 2 (green), according to the rules of Wordle:

  • green indicates that the character gi of g at position i exactly matches the character wi of w at that same position;
  • yellow indicates that the two characters do not match but gi does appear in w in some other position j for which gj is not equal to wj;
  • grey indicates that gi appears nowhere in w.”

“Is that all?” replies Bob, who is relieved that Carrie has set him such a simple task, though somewhat bemused by Carrie’s precision.

“That is literally all”, says Carrie. “Somebody interacting with your program would supply the guess g and your program should give them nothing back except o. And anyone else who doesn’t get to see o should get nothing from your program. Do you want to give it a go?”

Bob nods confidently and heads to the whiteboard. Carrie, understanding something about performance anxiety, leaves the room to allow Bob to write his program. After some time she returns to find Bob’s solution on the board, as follows.

 1enum answer {
 2  GREY,
 6typedef enum answer ans;
 8// determine whether the first n characters of `word` contain the character c
 9// at some position i for which word[i] != guess[i]
10int ccontains(char *word, char *guess, int n, char c)
12  if (n > 0){
13    if (word[0] != guess[0] && c == word[0]){
14      return 1;
15    }
16    return ccontains(word+1,guess+1,c,n-1);
17  }
18  return 0;
21// precondition: word, guess, and out are all arrays of positive length n
22void do_guess(char *word, char *guess, ans *out, int n)
24  int i=0;
25  while (i < n)
26  {
27    char g = guess[i];
28    int ot = g == word[i] ?
29      GREEN :
30      (ccontains(word,guess,n,g) ? YELLOW : GREY);
31    out[i] = ot;
32    i++;
33  }

“How did you go?”, she asks Bob.

“Pretty well, I think”, Bob replies. After all, he not only wrote what he considers to be a correct solution, under the assumptions documented in the comments, but he also managed to show off in various places. For instance, he parameterised the length of the strings n so it can work for arbitrary length strings. He also got to use C’s conditional expression operator c ? a : b which evaluates to a when c is true (non-zero), and to b otherwise. Finally, he also got to show that he understands recursion.

“It’s not perfect”. Carrie’s words come as a surprise to Bob. He’s sure he thought of everything.

“Consider what would happen if we ran your code on large strings”, Carrie says.

“Oh, are you worried about the potential for stack overflow?”, says Bob. “I already took care of that by making sure that ccontains() is tail recursive. An optimising C compiler will compile it to a loop that uses a fixed amount of stack space, ruling out the possibility of stack overfl—”

“I’m not talking about that”, Carrie interjects. “Remember, I told you that your program should give nothing back to its user besides the output string, which in your program you called out.”

“But, it doesn’t give anything more than that”, Bob reasons.

“Wrong!”, says Carrie. “It gives away much more. Think about the yellow positions in the output. Imagine a guess that produces an output with a single yellow square. That is supposed to tell the user only that that character appears somewhere in the word, but your implementation of ccontains() gives away information about where the guessed character appears: if it returns quickly, the player knows the letter must be near the front of the word, for instance.”

“You’re serious?”, Bob asks not a little incredulously. “You think somebody playing this game is going to mount a timing-side channel attack on the implementation to help them guess the answer more quickly? This is Wordle; it’s hardly a foundational cryptographic cipher implementation that underpins the Internet’s security. And in any case, there’d be so much noise in the program’s execution timing, the player would need to perform a huge number of guesses to extract a signal from the timing channel. Surely, they’d be better off to just win the game normally.” Bob concludes, “This timing-leak is not a realistic threat.”

Carrie looks unimpressed. “Bob, you may be right. But you’re the one who said you could write a perfect implementation, remember. I’ve pointed out to you that your implementation is imperfect. What are you going to do about it?”

Bob, understanding that he is being set a challenge, concedes. “Fine,” he sighs, “can I have another go?”

“Of course”, Carrie replies reasonably. Bob sets to work modifying his ccontains() function to ensure that it will take the same amount of time to execute, no matter what output it computes.

He begins writing and, after a few minutes, he pauses having come up with the following.

1int ccontains(char *word, char *guess, int n, char c)
3  if (n > 0){
4    return ((word[0] != guess[0] && c == word[0]) ||
5            ccontains(word+1,guess+1,n-1,c));
6  }else{
7    return 0;
8  }

He sneaks a glance at Carrie, who is shaking her head mouthing something about “short-circuit evaluation”.

“No, hang on a sec”, he says, before judiciously erasing one ampersand and one pipe, and adding some brackets. “There!”

1int ccontains(char *word, char *guess, int n, char c)
3  if (n > 0){
4    return (((word[0] != guess[0]) & (c == word[0])) |
5            ccontains(word+1,guess+1,n-1,c));
6  }else{
7    return 0;
8  }

Carrie looks carefully at Bob’s solution. “I think you might have made it worse.”

“What?”, splutters Bob. “How can it be worse?”

Carrie turns to Bob. “Remember I said that somebody who doesn’t get to see the output string outshould get nothing from your program. Well, now you’ve guaranteed that when ccontains() runs, it will always take the maximum amount of time to execute. So somebody observing your program, from whom the output string out is hidden, can now easily tell whether ccontains() is being called. That tells them about green squares in the output, since ccontains() is only called for squares that are not green.”

Bob, who has by now fully accepted that the price of perfection against Carrie’s challenge requires him to rule out timing channels in his code, looks again at his implementation of do_guess(). Sure enough, Carrie is right. The conditional expression operator c : a ? b that he was so fond of only minutes before has come back to bite him.

“You’re right”, Bob says, keen to show Carrie he understands. “The problem is on lines 28–30: ccontains() will only be called when the guess character g is not equal to the word character word[i].”

“Exactly”, Carrie replies. “Do you know how to write a constant-time conditional expression?”

“Err, no, I’m afraid not”, Bob replies with an obvious note of resignation.

“There’s lots of ways”, Carrie says, walking to the whiteboard. Bob hands her the marker and she writes the following function.

1int ct_choose(int c, int a, int b)
3  return ((c * a) + ((1 - c) * b));

Carrie explains. “This implementation works for ints and assumes that its condition argument c is either 0 or 1. It returns a when c is true, i.e. 1, and b when c is false, i.e. 0. It behaves just like c ? a : b but always evaluates both a and b.”

“Neat!”, Bob replies as Carrie hands back the marker, before modifying his do_guess() function to use Carrie’s ct_choose().

 1void do_guess(char *word, char *guess, ans *out, int n)
 3  int i=0;
 4  while (i < n)
 5  {
 6    char g = guess[i];
 7    int ot = ct_choose(g == word[i],GREEN,
 8                       ct_choose(ccontains(word,guess,n,g),YELLOW,GREY));
 9    out[i] = ot;
10    i++;
11  }

Bob turns back to Carrie, his initial confidence in the perfection of his solution long gone. “Is it perfect?”, he asks.

“Honestly? I don’t know.” was Carrie’s response. “You see, there’s still plenty of opportunities for this code to go wrong. There’s nothing stopping the compiler from turning this seeming constant-time code into something that has a timing leak when compiled. Who knows whether the multiplications and additions used in ct_choose() are constant-time on whatever CPU this code might run on? Or whether we should worry about speculative, and out-of-order execution? And we haven’t even talked about documentation. Somebody using it might not notice your comments describing its preconditions. Or somebody who maintains it in future might not realise the subtlety of the constant-time implementation choices and refactor it, reintroducing the timing leaks we worked so hard to remove.”

“So, I guess it’s still not perfect, huh?” Bob replies.

“Probably not”, Carrie agrees, before continuing,

“Programmers who seek perfection are chasing an impossible dream, and anyone who promises they can write perfect code is lying to you, and to themselves.”

“What about formal methods?”, Bob asks. “I ask because I know your company writes high-assurance software. Couldn’t they help you to write prefect code?”

“Funny you should ask”, Carrie replies. “If one were sufficiently masochistic, I suppose one could implement a Wordle solution similar to yours and formally verify that it is both correct and doesn’t leak the kinds of information we’ve talked about. Although doing so would require extending the expressiveness of state-of-the-art formal methods beyond what was thought possible even by their original designers. But even if you did that, there’s no guaranteed you’d be closer to perfection than your code on the whiteboard.

“Despite what some might imply, formal methods do not produce perfect code—as if such a thing could exist at all. Software is no more a mathematical entity than it is a poem, or a song. Sure, rigorous mathematics can help you understand with some confidence how your software might behave, under various (often undocumented) assumptions. But, to paraphrase Einstein, as soon as formal methods refer to reality they cannot be certain; and any certain claims produced by formal methods cannot talk about reality, but only some platonic ideal. Give me carefully engineered code whose design and implementation is a wise balance of competing real-world trade-offs any day, over software engineered above all else to fit a strict mathematical ideal, necessarily divorced from reality. If you can formally verify the former, all the better—it’s not like we know of better quality assurance methods more likely to remove bug classes.”

“Right”, says Bob. “Don’t make formal methods the enemy of the good.”

Carrie turns to Bob and, with a satisfied smile, asks “When can you start?”