When I was a Computer Science undergraduate, at the turn of the millennium, *by far* the least engaging subjects
that I studied were those on Software Engineering.
Few know how to make Gantt charts exciting, and discussions of software development
process models have never been my cup of tea, even if I did develop a
real appreciation for empirical software engineering in the meantime.
I deeply dig the irony, therefore, that I chose to be a Software Engineering
teacher in 2016.
(More on that choice some other time.)

In this post I want to share a snippet of code I came across in some course material from an unnamed university, while I was researching ideas for 2016’s instalment of SWEN90006 - Software Testing and Reliability. It reinforced my view that software engineering teaching needs to be rooted in a deep appreciation for programming. Absent such understanding, it is all too easy to find yourself modelling worst practices to your students.

# What’s Wrong With This Code?

```
1 int fibonacci( int N ) {
2 int i, j, Sum;
3 int *C;
4
5 C = malloc( sizeof( double ) * ( N + 1 ) );
6 if( C == NULL )
7 FatalError( "Out of space!!!" );
8
9 C[ 0 ] = 1; C[1] = 1;
10 Sum = 2;
11
12 for( i = 2; i <= N; i++ ) {
13 C[i] = C[i-1] + C[i-2];
14 Sum += C[i];
15 }
16
17 free( C );
18 return Sum;
19 }
```

It’s been over 10 years since I did any serious C programming; these days I spend far more time writing proofs than I do programs. But even to my eye this code is obviously poor, and objectively exhibits a number of poor engineering choices (outlined below) with real negative effects (i.e. it’s buggy as hell). That it (purportedly) implements such a trivial function only makes things worse: trivial code ought to be nothing but. While I’m sure it was included with the best of intentions, its presence in a syllabus on how software should be engineered, of all things, is disappointing. By putting this snippet in your teaching materials, you’re proving the adage that ‘’those who can’t [do], teach.’'

So let’s see why. (For the curious, at the end of this post I’ve included my own model solution to show how this code could have been written to avoid the issues I outline below.)

## Specification

The accompanying material says that function’s spec is to:

compute the sum of the first

`N`

fibonacci numbers: 1, 1, 2, 3, 5 …

This brings us to the first problem with the code: it doesn’t meet the spec.

Even assuming a perfect world, where undefined behaviour was absent and C’s designers had heeded the
principles of safe
defaults and least surprise^{1}, the spec that this code implements is,
at best, to:

compute the sum of the first

`N+1`

fibonacci numbers or the result 2, whichever is larger.

There is a straightforward off-by-one error.
However, even ignoring that, `fibonacci(0)`

is still liable^{2} to return 2.

## Design

Spec aside, let’s assume that the second specification was the one intended.
This then brings us to the function’s design, i.e. *how* it does what the revised
spec says it ought to. Roughly speaking, the design that this code implements
is:

to compute the sum of the first

`N+1`

fibonacci numbers, dynamically allocate an array of sufficient length to hold them all, populate the array and, while doing so, sum its elements.

The use of dynamic memory allocation is totally unnecessary: computing the
`k`

th fibonacci number (for `k >= 2`

) requires only the preceding two fibonacci numbers, not
*all* of the preceding ones.^{3}

By unnecessarily using dynamic memory allocation, its design breaks the fundamental principle of simplicity (returning yet again to Saltzer and Schroeder, it violates the principle of economy of mechanism). This poor choice has real negative implications.

Firstly, its space complexity is now O(`N`

). But secondly, this one design
decision then forces an impossible choice onto the programmer: what to do when
`malloc()`

fails? There is no sensible way for `fibonacci()`

to indicate this
failure to its caller. The programmer in this case has resolved this impossible
situation by having `fibonacci()`

abort execution entirely when `malloc()`

fails.

That consequent choice, while borne of an impossible situation, is disastrous.
Any function that calls `fibonacci()`

now runs the risk of having the entire
program’s execution aborted. Thus `fibonacci()`

has lost all pretensions to
modularity: it can now be used only in the most restrictive of situations.

Returning to the specification above, it might
be (a little) more correct to say, then, that the effects of calling `fibonacci(N)`

are to:

compute the sum of the first

`N+1`

fibonacci numbers or the result 2, whichever is larger, or nondeterministically abort all execution.

## Implementation

As with the spec, let us assume that the design is legitimate, and turn our
attention to how this function’s design has been rendered in code, i.e. its
*implementation*, in this case in C.

As I said above, I’m no C programmer; however, the bug density here is awesome.

- line 1: choice of inappropriate argument type (namely signed integer) to represent a non-negative integer value;
- line 5: signed integer overflow on addition or multiplication will cause undefined behaviour;
- line 9: possibility of out-of-bounds array access;
- lines 13 and 14: signed integer overflow on addition, as above, plus potential array access out-of-bounds.

The choice of the signed type `int`

for the argument
`N`

(and then also the local variables) is a fatal first step that is then
compounded by the potential for overflow when computing the amount of memory
to allocate for the array `C`

, leading to `C`

being allocated with a size far
too short to accommodate the subsequent array accesses.

Other odd or poor implementation choices include:

- line 1: lack of common variable naming scheme (specifically, inconsistent use of upper vs. lower case);
- line 2: unused variable
`j`

declared; - line 5: use of
`double`

in the`sizeof`

expression, for an array of`int`

s. - line 6: the condition
`C == NULL`

: one typo would render that condition`C = NULL`

, which would be fatal for the memory-safety of this program.^{4}

## Testing and Maintenance

Rounding out the traditional software development lifecycle, the poor design and implementation choices above mean that this function will be both difficult to test and difficult to maintain. Let me never be a software tester or quality assurance engineer who has to pay the price for somebody else’s poor design and implementation choices.

# Model Solution

So how could this code have been better designed and written? Please chime in with your own solution in the comments. However, I would have written it like so:

```
1 unsigned int fibonacci(unsigned int n) {
2 unsigned int a = 0, b = 1;
3 unsigned int sum = 0;
4 while (n > 0) {
5 unsigned int t = b;
6 b += a;
7 a = t;
8 n -= 1;
9 sum += a;
10 }
11 return sum;
12 }
```

This code:

Correctly

^{5}implements the original specification (literally modulo the effects of fixed-width machine word arithmetic) to compute the sum of the first`n`

fibonacci numbers: 1, 1, 2, 3, 5 …Avoids the O(

`N`

) space complexity of the original; its space complexity O(1).Avoids dynamic memory allocation.

Won’t cause unwanted effects (like halting all execution), so is far more modular (a virtue of being a pure function).

Is free of undefined behaviour and other implementation bugs.

Is far easier to test and maintain than the original.

At about 60% of the size of the original, is also more concise while being no less readable.

# Lessons

What lessons should we take away as teachers?

Take the time to properly understand the programming language in which you cast your teaching examples.

Model best practices in your example material.

Consider using type- and memory-safe languages for your examples, to limit the number of effects you need to worry about, unless what you are teaching is relevant only to unsafe languages like C.

Consider having your software engineering materials vetted by an experienced software engineer.

Spend some time programming. I pity any software engineering teacher who has never contributed to an open source project or otherwise spent time working with (and learning from) great programmers.

I’d love to hear your thoughts.

## Notes

[1] I’m indebted here to @alexmurray
who pointed out the parallel between C’s choice that `int`

should denote
signed integer, making signed the default interpretation, and Saltzer and
Schroeder’s principles.

[2] I say ‘’liable’’ because I can’t tell from reading
the C standard whether `fibonacci(0)`

might exhibit undefined behaviour
(e.g. whether an array of length `sizeof(double)`

is guaranteed to accommodate
two `int`

s), thereby
absolving the compiler of having it
return any specific answer.

[3]
This follows trivially from the mathematical
definition of the fibonacci sequence, which may be defined as follows. Here
we’ll write `fib k`

to
denote the `k`

th fibonacci number indexed from zero (i.e. `fib 0`

is the
first fibonacci number.)

```
fib 0 = 1
fib 1 = 1
fib (n + 2) = fib n + fib (n + 1) -- for n >= 0
```

[4] Was the if-statement instead

```
if( C = NULL )
FatalError( "Out of space!!!" );
```

`NULL`

to `C`

but likely never take the then-branch
(i.e. never invoke `FatalError()`

) as the value that the statement `c = NULL`

evaluates to is the right-hand-side of the assignment, i.e. `NULL`

, which is
falsy.@alexmurray informs me that Clang will warn about such mistakes, while
gcc does not unless `-Wparentheses`

or stronger is given.

[5] How do I know this code is correct? It is
pretty easy to see that it is memory-safe and free of undefined behaviour, and
so forth.
But everyone knows that it is all too easy to make a simple off-by-one error,
for instance, as the original code does. I believe this code has no such errors
because I can *prove* it is correct.

While this blog post is not the place for a tutorial on Hoare logic, the way to prove that a loop is correct
is by defining a *loop invariant*. This is
a logical statement that:

- is true right before the loop,
- is true after each loop iteration, and
- tells us enough information to conclude that the loop is correct.

If we let `N`

denote the initial value of the
parameter `n`

that is passed to this function, then a suitable loop invariant
for this program is the following, which I’ll admit is a little cumbersome:

(

`N = n`

and`b = 1`

and`a = 0`

and`sum = 0`

)

or

(

`n < N`

and`b = fib (N - n)`

and`a = fib (N - n - 1)`

and`sum = fib 0 + ... + fib (N - n - 1)`

)

*(Edit: Credit goes to Paul Vergeev who noticed
that I’d originally written down the invariant incorrectly, writing
sum = fib 0 + ... + fib (N - n) instead of sum = fib 0 + ... + fib (N - n - 1) as it is fixed above. While I’d proved the invariant in Isabelle using AutoCorres, I’d transcribed it incorrectly into the post. Thanks, Paul!)*

This invariant is phrased in two parts as a case distinction.
The first part (the top line, i.e. everything before “or”)
specifies the initial conditions and is true before the loop is entered,
which means the entire invariant holds before we enter the loop. The second
part specifies what is true after each loop iteration: given that the
invariant holds
after the `k`

th iteration of the loop, this second part is always
guaranteed to hold after the
`k + 1`

th iteration, meaning that the invariant will hold after each loop
iteration. Note here that we are using the
`fib`

function, defined in note 3 above, as a *formal
specification* for the fibonacci sequence, which we’ll interpret as operating over the same
word type as the C `fibonacci()`

function that we’re proving correct.

Once the program reaches line 11 (either because the loop terminated
or because its body was never executed), we know that the invariant is guaranteed to be
true: it is true before we entered the loop, so if the loop body never executed
it will certainly be true, and it is always maintained by the loop body so by
induction it must be true after the loop finishes.
We also know that `n = 0`

once we hit line 11, because we only get here when
the loop guard is false.
From these two facts, we know
therefore that:

(

`N = 0`

and`b = 1`

and`a = 0`

and`sum = 0`

)

or

(

`N > 0`

and`b = fib N`

and`a = fib (N - 1)`

and`sum = fib 0 + ... + fib N - 1`

)

The top part tells us that the function is correct when called with argument
0: `fibonacci(0)`

will return 0, since in this case `sum = 0`

.
The bottom part tells us that the function is correct in all other cases:
`fibonacci(N)`

will return `fib 0 + ... + fib N - 1`

when `N > 0`

, since in this case
`sum = fib 0 + ... + fib N - 1`

.

You can stare at the code and the loop invariant and try to convince yourself whether my reasoning is correct. Alternatively, this kind of reasoning is automated or assisted by program verification tools like Dafny or AutoCorres.