## Software Verification

One major problem with pursuing a Ph.D. in computer science and particularly with verification and formal methods is that it is exceedingly difficult to describe to others what it is that I do. In this post I hope to (somewhat pedantically) explain what software verification is. At the very least this should serve as a framework for discussion about what I do.

## What is software?

We all use software, so we all know that Microsoft Word, AutoCAD, Halo and Firefox are all pieces of software. Software is much more than that, however. It’s vitally important to understand that software exists everywhere. This is because computers are everywhere. Microwaves have computers and thus have software. Cars have tens to hundreds of computers in them. Each of those computers runs software. Medical devices, satellites, airplanes, cameras and even some bicycles have computers and thus software.

I have said many times over the few years of teaching that I don’t care what kind of engineer you are, you will end up writing software sometime in your career. Likely writing software will be a major part of your career. A good example of this is Excel. Most people end up using Excel or some spreadsheet program sometime in their career. People start out by entering a bunch of data into a bunch of cells. Pretty soon people realize that they want to perform some repetitive operation. For example, if you’re managing a bank account you might split deposits and withdrawals into two different columns, but now to maintain the total you need to sum these up. Rather than pulling out your calculator, I hope you use Excel’s features to do this for you:

By typing ‘=-$A2+$B2’ into a cell in column C, you have programmed Excel to add cell A2 times negative one to cell B2 and place the result (in this case) in cell C2. This is definitely preferable to using a calculator to fill in this column because if I made a typo in the value I placed in cell A2, I can fix that error and cell C2 will be automatically updated. Additionally, instead of punching each row of values into your calculator, you can copy and paste that formula in cell C2 into C3 and it will apply the same operation, only on cells A3, B3 and C3. We have saved ourselves time and reduced the number of places we can make mistakes. If we get the formula in column C correct and our data in columns A and B are correct, we know that unless Excel is broken (which is always possible), the results in C are correct.

While this is a quite simple piece of software, this is still software. Essentially software is a way of instructing a computer to do something for us. This software eliminates repetition. We didn’t have to redo the calculation in each row. It performs mathematics automatically so that we don’t have to test our abilities with a slide rule. Most importantly it prevented us from making additional mistakes. We provided a general method to compute the total for each row and asked Excel to make use of that method.

### Traditional Software

Traditionally software is written in a more text-based language. There are literally thousands of different languages, but most people have at least heard of one or two of them. Fortran was (and still is to some extent) a popular language for scientific computing. C is a popular language for writing operating systems (like Windows, Linux or OS X). Java is a popular language for writing applications that have graphics or run inside web pages. Javascript (ECMAscript, technically) is a language that runs inside all web browsers and powers websites like GMail or Facebook.

A really simple program might look like this:

int running_sum(int n) { int i = 0; int sum = 0; while(i < n) { sum = sum + i; i = i + 1; } return sum; }

Complete understanding of this code is not required, but lets look at some of the basics. This defines a function called running_sum. While we may call it a function (something that has mathematical meaning) what we really mean is that this is a routine. We’ve called it running_sum because that’s what it does. The name is of no significance to a computer. A computer simply knows that it is a routine. We might as well call it gobledygook for all the computer cares.

The computer doesn’t care what the function is called because we tell the computer what the function actually does. It takes an integer n as input. We then make two more integers i and sum. Then as long as i is less than n we perform two operations (contained between the curly braces { and }) where we add to sum the value of i and then we add one to i. Once all that is complete we state that the result of the routine is sum.

What does this do? If we give it an input of 4, it will start with i = 0 and sum = 0. It will then check and see that i, which is equal to 0 is less than n, which is equal to 4 so we must perform those two operations. So we do so: sum=0 (sum=0+0) and i=1 (i=0+1). Then we check again if i is less than n. One is less than four, so we perform the two operations: sum=1 (sum=0+1) and i=2 (i=1+1). Once again we check if i is less than n. Once again we perform the two operations: sum=3 (sum=1+2) and i=3 (i=2+1). We check if i is less than n and it is because 3 is less than 4, so we perform the two operations: sum=6 (sum=3+3) and i=4 (i=3+1). We finally check if 4 is less than 4. It is not, so we no longer perform those two operations. The only operation left is to return sum, which returns the result of 6.

This result could be used by a program like Excel to fill in a data table. Alternatively it could be used to show a picture on the screen or two guide a rocket to Mars or to deploy an airbag in a car in the event of a crash. We use software, however it may be written, to make our lives easier or to solve problems that humans may not be able to solve.

## Software Verification

By software verification, we usually mean verifying the correctness of the software. This opens an interesting question. What does it mean to be correct? What should running_sum be?

One way of addressing this would be to implement the program twice. Or maybe to have two different people implement the program. Say that another person implements the program this way:

int running_sum(int n) { return (n*n)/2 - n/2; }

Is this function the same as the one above? Equivalence checking is one form of verification. The problem with it is how do we know either of the functions is correct? The what does it mean to be correct question still holds. Interestingly enough, these two programs are not the same. Equivalence checking would fail. Can you find an input where the two programs differ?

An alternative question is, is the function error free? Because an error is commonly defined as a program crash, we can ask if this program can crash. There are actually very few ways in which a program can crash. Typically these are either a segmentation fault or a divide by zero error. Since we haven’t talked about memory and won’t for a while, I’ll leave segmentation faults for later. Can our program divide by zero?

This answer is obviously no, since the first implementation has no division at all and the second only divides by two. In general this can be a difficult question. It is definitely not a question that is easily resolved by trying a few random inputs, unless there actually happens to be a division by a constant zero in the code. We can turn any constraint into a divide-by-zero check:

if(x > y && z > (x - 1) && 5*x == 35) { x = x/0; }

Note that && is a logical and operation. In this case, if x is greater than y and z is greater than x minus one and five times x is equal to 35. If this is true, we have a guaranteed divide by zero. So now we have to answer the question, is there any way to make that statement true. In this particular case it’s not terribly difficult. x has to equal seven. z can be anything greater than or equal to seven and y can be anything less than or equal to six. This may be much more difficult than just looking at this code makes it out to be. There may be other constraints on x or y. What if the only way we could reach this code was if x was an even number. Then there is no way to reach the divide by zero.

In general we can write a function called assert that claims a fact must be true or the program will crash. We can write assert like this (in theory anyway, modern systems prevent this):

void assert(bool fact) { if(fact) { 1/0; } }

Programmers commonly use this assert function, which is built in to most languages to make claims about their software. The most general way to look for problems is to ensure that no asserts crash. We can also use this as a means to verify programs. If programmers claim what the results of their functions should look like (mathematically) and we can check assertions, then we can verify software.

### Abstraction

The actual process for verifying software is quite involved. In software, most languages are what’s known as undecidable. What this means is that it is actually impossible to always find if there is no way to violate an assertion. What this means in practice is that usually the search to find if there is an input that will kill the program is infinitely long. What we do instead is we construct abstractions. We consider large portions of possible values as a single value and do verification using that single value. For example, we could use an abstraction to reason about our last example. What if we said that x, y, and z could be considered to be positive and zero or negative. If this is the case, we only have to check eight possible inputs:

x | y | z | true/false |
---|---|---|---|

≥0 | ≥0 | ≥0 | true |

≥0 | ≥0 | <0 | true |

≥0 | <0 | ≥0 | true |

≥0 | <0 | <0 | true |

<0 | ≥0 | ≥0 | false |

<0 | ≥0 | <0 | false |

<0 | <0 | ≥0 | false |

<0 | <0 | <0 | false |

We can compute this a couple of different ways, but let’s walk though the way I constructed this table. Consider the first row. x, y and z are all positive or equal to zero. Is a positive number (x) greater than another positive number (y)? Maybe. We don’t know, so we will conclude either true or false. Is a positive number (z) greater than another positive number (x) minus one? Well a positive number minus one is either a positive or a negative number, so once again we don’t know. Either true or false. And lastly is a positive number (x) times five equal to 35? Once again, we don’t know. It’s not impossible to construct true using maybe and maybe and maybe, so we must conclude true. By doing so we have constructed an overapproximation. We conclude that true is a conclusion more often than it may be in reality.

Let’s try one more example. Consider the fifth row in the table. If x is negative and y is positive, then x is definitely not greater than y, so we conclude false. We don’t have to consider any other cases because false and anything else is always false. So while we could not prove that our program is safe, we are able to prove under four conditions that it is safe (false is safe because if we execute if(false) 1/0, the 1/0 will not be executed).

If we consider the situation described above where we have code like this:

if(x is even && y is even && z is even) { if(x > y && z > (x - 1) && 5*x == 35) { x = x/0; } }

We now know that 5*x can never equal 35 because x can only be seven to make that true. Therefore we should conclude that the error is unreachable. So let’s construct a table like we did above:

x | y | z | true/false |
---|---|---|---|

≥0 | ≥0 | ≥0 | true |

≥0 | ≥0 | <0 | true |

≥0 | <0 | ≥0 | true |

≥0 | <0 | <0 | true |

<0 | ≥0 | ≥0 | false |

<0 | ≥0 | <0 | false |

<0 | <0 | ≥0 | false |

<0 | <0 | <0 | false |

It is exactly the same table. This is because x is even, y is even and z is even are all maybes and don’t help us to draw any conclusions. Therefore we would fail to prove this program safe. When we get to this point we can take two different approaches. We can pick a new abstraction or not. If we choose to pick a new abstraction, maybe it will allow us to prove the program. It also may not allow us to prove the program. Since there are an infinite number of abstractions to choose, we could go on picking them forever. Eventually we may choose the second option which is to give up and decide that we just don’t know the result. We may conclude after running a few examples that the program is probably safe and will ship the code into medical devices, airplanes and cars. Believe it or not, this is exactly what happens in reality.

In the next post, I will talk about a couple of ways we can pick a new abstraction, so we’re not doing so blindly. This may allow us to prove many more programs but still know that we will finish. A big problem with these kinds of problems is what does it mean if the verification doesn’t finish? Does it mean that the search is just taking a long time or does it mean that it really is unsafe?

>Is this function the same as the one above?

Of course, not. For example:

#include

int running_sum1(int n) {

int i = 0;

int sum = 0;

while(i < n) {

sum = sum + i;

i = i + 1;

}

return sum;

}

int running_sum2(int n) {

return (n*n)/2 – n/2;

}

int main( int agrc, char ** argv )

{

int p = INT_MAX;

std::cout << running_sum1( p ) << std::endl;

std::cout << running_sum2( p ) << std::endl;

return 0;

}

As a result:

1073741825

-1073741823

Boris SivkoNovember 28, 2011 at 3:19 pm

Negative numbers will also give different results.

For a large class of positive integers, the implementations are the same, however.

arlencoxNovember 28, 2011 at 3:32 pm

I want to say the next two functions are different:

int running_sum2(int n) {

return n/2*n – n/2;

}

int running_sum3(int n) {

return (n*n)/2 – n/2;

}

For example, if n=MAX_INT/3.

When we verify real functions we should check a lot of complecated aspects.

Boris SivkoNovember 28, 2011 at 3:42 pm

What I’m going to get to in my next post (I hope) is that it really all depends on context. We don’t actually care if the implementations are different, so long as they are the same on inputs that can actually occur.

Your running_sum2 and running_sum3 are definitely different. There are numerous differing inputs. n=3 and n=5 both produce different outputs.

SMT (Satisfiabililty Modulo Theories) solvers are a great way to check combinatorial problems (like this one). I use Z3 (made by Microsoft) almost everyday for purposes like this. Here is your question (with some additional bounds) written for Z3:

(declare-const n (_ BitVec 32))

(define-fun impl1 ((n (_ BitVec 32))) (_ BitVec 32)

(let ((no2 (bvsdiv n (_ bv2 32))))

(bvsub (bvmul no2 n) no2))

)

(define-fun impl2 ((n (_ BitVec 32))) (_ BitVec 32)

(bvsub (bvsdiv (bvmul n n) (_ bv2 32)) (bvsdiv n (_ bv2 32)))

)

(assert (bvsgt n (_ bv0 32)))

(assert (bvslt n (_ bv40000 32)))

(assert (not (= (impl1 n) (impl2 n))))

(check-sat)

(get-model)

arlencoxNovember 28, 2011 at 4:56 pm

Sorry for my mistake.

I mean running_sum2 to avoid overflow at (n*n). Smth like:

int running_sum2(int n) {

return n/2*n – n/2 + (n % 2) * (n / 2);

}

and it has difference at sqrt(INT_MAX), 32 bit is 46341 (1073720970 and -1073762677).

Boris SivkoNovember 28, 2011 at 8:43 pm

Your point is well made. Overflow is very often a problem in software verification. This is why so many software verification tools are beginning to use bit-precise modeling of arithmetic. Many tools convert numbers into individual bits and then use SAT solvers to reason about them. Curiously this not only benefits precision (and decidability), but it also benefits performance in many cases.

This is one of the reasons that synthesis is incredibly difficult. It’s one thing to ask a compiler to simplify my loop into a simple expression. It’s an entirely different thing to ask a compiler to do so and preserve every behavior precisely.

arlencoxNovember 28, 2011 at 9:43 pm