Arlen Cox

Ramblings on my hobbies

Archive for the ‘Software Verification’ Category

Picking an Abstraction

with 3 comments

In my last post I discussed how we use abstractions such as the sign extraction to limit our search to a small, finite space.  So let’s talk about how to improve those abstractions so that we can have more precision and hopefully prove our property.

Let’s consider this problem from last time:

if(x is even && y is even && z is even) {
  if(x > y && z > (x - 1) && 5*x == 35) {
    x = x/0;
  }
}

We discussed last time that this program will not fail because of the combination of x is even and 5*x = 35.  We allowed ourselves to choose between two possible predicates: v \geq 0 and v < 0, where v is a variable.  By doing so, we constructed this table, which showed that even though this program won’t fail, we couldn’t prove it selecting from those two predicates.

x y z true/false
≥0 ≥0 ≥0 true
≥0 ≥0 <0 true
≥0 <0 ≥0 true
≥0 <0 <0 true
<0 ≥0 ≥0 false
<0 ≥0 <0 false
<0 <0 ≥0 false
<0 <0 <0 false

So what if we added more predicates?  If we added more choices it would make the search harder, but it might make it possible to prove this program safe (prove that the error is unreachable).  So let’s add another pair of predicate options: v \% 2 = 0 and v \% 2 = 1.  The % operator is modulus.  It gives us the remainder after division.  If the remainder is zero when dividing by two, it means that the number we divided by two is even.  If the remainder is one, that means that the number we divided by two is odd.

Since we now how more options on which predicates to choose, let’s specify how we can combine them.  For simplicity we will say that we can only combine them conjunctively (and, written \wedge).  So we could say that y \geq 0 \wedge y \% 2 = 0, but we could not write that either y \geq 0 or y \% 2 = 0.

Now that we have added predicates, what does this mean for our table?  Since we six choices for each variable: \geq 0, < 0, \geq 0 \wedge \% 2 = 0, < 0 \wedge \% 2 = 0, \geq 0 \wedge \% 2 = 1, and < 0 \wedge \% 2 = 1, and we have three variables, that means we have 6^3 or 216 rows in our table.  Additionally reading the table is no longer straightforward.  We cannot simply look for true/false on the right because we have to make a choice.  Let’s not do it this way.  It sounds hard and complicated.  Let’s look back at the problem to decide how to solve it.

One approach would be to syntactically look at the formula and see if anything matches nicely.  Since we have x is even, y is even and z is even, this does match nicely with x \% 2 = 0 \wedge y \% 2 = 0 \wedge z \% 2 = 0.  If we use this constraint, we see that we can no longer find any true values, so we have proved the program.  Hooray!

Searching Smarter

The problem is that this won’t work in more general cases.  With a large number of possible predicates and a large number of variables, the size of the problem will grow tremendously and quickly become practically impossible to solve.  So what we can do is use a method called predicate abstraction.

Predicate abstraction uses a loop.  We start by picking simple predicates, like the sign predicates.  We then attempt to verify the system, like we did above.  If the verification succeeds (concludes that the program is Safe) we know that the program is actually safe because the abstraction represented an overapproximation of the behavior of the system.  We allowed more behaviors than the program, therefore if it doesn’t go wrong in any of those behaviors, it won’t go wrong in the program.

If verification finds that the program is unsafe (such as a true result in the table above), we can generate a concrete counterexample.  We essentially pick random values that meet the constraints of a failing case.  We could choose row 1 in the table, so we need to pick a values for x, y and z that are greater than zero.  For example, we might pick x = 3, y = 4, z = 0.  We then run the actual program on those values to see if we find a bug.

Since x = 3 is not even, we execute the first if and thus cannot reach the error location.  This checking of the concrete counterexample in this case will reveal that this particular counterexample is impossible.  If it weren’t impossible, it would be a real bug and would indicate a completely failed proof.  In this case it is impossible and thus we must select new predicates that would exclude this counterexample.

We select new predicates by looking at why we could not reach the bug.  We could not in this case because x = 3 is not even.  Therefore we’ll add a new constraint that x must be even or x \% 2 = 0.  Now that we have done this, we can check the system again:

x y z true/false
x \geq 0 \wedge x \% 2 = 0 ≥0 ≥0 false
x \geq 0 \wedge x \% 2 = 0 ≥0 <0 false
x \geq 0 \wedge x \% 2 = 0 <0 ≥0 false
x \geq 0 \wedge x \% 2 = 0 <0 <0 false
<0 ≥0 ≥0 false
<0 ≥0 <0 false
<0 <0 ≥0 false
<0 <0 <0 false

Now we can conclude that the program is safe because the combination of x \% 2 = 0 \wedge 5x = 35 cannot be fulfilled.  Specifically an even number times an odd number is always an even number and 35 is an odd number.

This technique is far from perfect.  We have to have an established set of predicates to use before we begin and unless we know what predicates to pick, verification will fail often because it ran out of predicates.  I was able to select x is even as a predicate, which was quite important for proving this program safe.  That said, for many programs this is an adequate way of performing proofs.

In my next post I will discuss verification of more realistic programs involving loops and functions.  I will also introduce abstract interpretation, which is a different method for doing program proofs.  Eventually we’ll get into what it is that I do.  This technique presented here was first applied automatically to software in 2001.

Advertisements

Written by arlencox

December 4, 2011 at 8:02 am

Software Verification

with 6 comments

One major problem with pursuing a Ph.D. in computer science and particularly with verification and formal methods is that it is exceedingly difficult to describe to others what it is that I do.  In this post I hope to (somewhat pedantically) explain what software verification is.  At the very least this should serve as a framework for discussion about what I do.

What is software?

We all use software, so we all know that Microsoft Word, AutoCAD, Halo and Firefox are all pieces of software.  Software is much more than that, however.  It’s vitally important to understand that software exists everywhere.  This is because computers are everywhere.  Microwaves have computers and thus have software.  Cars have tens to hundreds of computers in them.  Each of those computers runs software.  Medical devices, satellites, airplanes, cameras and even some bicycles have computers and thus software.

I have said many times over the few years of teaching that I don’t care what kind of engineer you are, you will end up writing software sometime in your career.  Likely writing software will be a major part of your career.  A good example of this is Excel.  Most people end up using Excel or some spreadsheet program sometime in their career.  People start out by entering a bunch of data into a bunch of cells.  Pretty soon people realize that they want to perform some repetitive operation.  For example, if you’re managing a bank account you might split deposits and withdrawals into two different columns, but now to maintain the total you need to sum these up.  Rather than pulling out your calculator, I hope you use Excel’s features to do this for you:

Writing software in Excel

By typing ‘=-$A2+$B2’ into a cell in column C, you have programmed Excel to add cell A2 times negative one to cell B2 and place the result (in this case) in cell C2.  This is definitely preferable to using a calculator to fill in this column because if I made a typo in the value I placed in cell A2, I can fix that error and cell C2 will be automatically updated.  Additionally, instead of punching each row of values into your calculator, you can copy and paste that formula in cell C2 into C3 and it will apply the same operation, only on cells A3, B3 and C3.  We have saved ourselves time and reduced the number of places we can make mistakes.  If we get the formula in column C correct and our data in columns A and B are correct, we know that unless Excel is broken (which is always possible), the results in C are correct.

While this is a quite simple piece of software, this is still software.  Essentially software is a way of instructing a computer to do something for us.  This software eliminates repetition.  We didn’t have to redo the calculation in each row.  It performs mathematics automatically so that we don’t have to test our abilities with a slide rule.  Most importantly it prevented us from making additional mistakes.  We provided a general method to compute the total for each row and asked Excel to make use of that method.

Traditional Software

Traditionally software is written in a more text-based language.  There are literally thousands of different languages, but most people have at least heard of one or two of them.  Fortran was (and still is to some extent) a popular language for scientific computing.  C is a popular language for writing operating systems (like Windows, Linux or OS X).  Java is a popular language for writing applications that have graphics or run inside web pages.  Javascript (ECMAscript, technically) is a language that runs inside all web browsers and powers websites like GMail or Facebook.

A really simple program might look like this:

int running_sum(int n) {
  int i = 0;
  int sum = 0;
  while(i < n) {
    sum = sum + i;
    i = i + 1;
  }
  return sum;
}

Complete understanding of this code is not required, but lets look at some of the basics.  This defines a function called running_sum.  While we may call it a function (something that has mathematical meaning) what we really mean is that this is a routine.  We’ve called it running_sum because that’s what it does.  The name is of no significance to a computer.  A computer simply knows that it is a routine.  We might as well call it gobledygook for all the computer cares.

The computer doesn’t care what the function is called because we tell the computer what the function actually does.  It takes an integer n as input.  We then make two more integers i and sum.  Then as long as i is less than n we perform two operations (contained between the curly braces { and }) where we add to sum the value of i and then we add one to i.  Once all that is complete we state that the result of the routine is sum.

What does this do?  If we give it an input of 4, it will start with i = 0 and sum = 0.  It will then check and see that i, which is equal to 0 is less than n, which is equal to 4 so we must perform those two operations.  So we do so: sum=0 (sum=0+0) and i=1 (i=0+1).  Then we check again if i is less than n.  One is less than four, so we perform the two operations: sum=1 (sum=0+1) and i=2 (i=1+1).  Once again we check if i is less than n.  Once again we perform the two operations: sum=3 (sum=1+2) and i=3 (i=2+1).  We check if i is less than n and it is because 3 is less than 4, so we perform the two operations: sum=6 (sum=3+3) and i=4 (i=3+1).  We finally check if 4 is less than 4.  It is not, so we no longer perform those two operations.  The only operation left is to return sum, which returns the result of 6.

This result could be used by a program like Excel to fill in a data table.  Alternatively it could be used to show a picture on the screen or two guide a rocket to Mars or to deploy an airbag in a car in the event of a crash.  We use software, however it may be written, to make our lives easier or to solve problems that humans may not be able to solve.

Software Verification

By software verification, we usually mean verifying the correctness of the software.  This opens an interesting question.  What does it mean to be correct?  What should running_sum be?

One way of addressing this would be to implement the program twice.  Or maybe to have two different people implement the program.  Say that another person implements the program this way:

int running_sum(int n) {
  return (n*n)/2 - n/2;
}

Is this function the same as the one above? Equivalence checking is one form of verification.  The problem with it is how do we know either of the functions is correct?  The what does it mean to be correct question still holds.  Interestingly enough, these two programs are not the same.  Equivalence checking would fail.  Can you find an input where the two programs differ?

An alternative question is, is the function error free?  Because an error is commonly defined as a program crash, we can ask if this program can crash.  There are actually very few ways in which a program can crash.  Typically these are either a segmentation fault or a divide by zero error.  Since we haven’t talked about memory and won’t for a while, I’ll leave segmentation faults for later.  Can our program divide by zero?

This answer is obviously no, since the first implementation has no division at all and the second only divides by two.  In general this can be a difficult question.  It is definitely not a question that is easily resolved by trying a few random inputs, unless there actually happens to be a division by a constant zero in the code.  We can turn any constraint into a divide-by-zero check:

if(x > y && z > (x - 1) && 5*x == 35) {
  x = x/0;
}

Note that && is a logical and operation. In this case, if x is greater than y and z is greater than x minus one and five times x is equal to 35.  If this is true, we have a guaranteed divide by zero.  So now we have to answer the question, is there any way to make that statement true.  In this particular case it’s not terribly difficult.  x has to equal seven.  z can be anything greater than or equal to seven and y can be anything less than or equal to six.  This may be much more difficult than just looking at this code makes it out to be.  There may be other constraints on x or y.  What if the only way we could reach this code was if x was an even number.  Then there is no way to reach the divide by zero.

In general we can write a function called assert that claims a fact must be true or the program will crash.  We can write assert like this (in theory anyway, modern systems prevent this):

void assert(bool fact) {
  if(fact) {
    1/0;
  }
}

Programmers commonly use this assert function, which is built in to most languages to make claims about their software.  The most general way to look for problems is to ensure that no asserts crash.  We can also use this as a means to verify programs.  If programmers claim what the results of their functions should look like (mathematically) and we can check assertions, then we can verify software.

Abstraction

The actual process for verifying software is quite involved.  In software, most languages are what’s known as undecidable.  What this means is that it is actually impossible to always find if there is no way to violate an assertion.  What this means in practice is that usually the search to find if there is an input that will kill the program is infinitely long.  What we do instead is we construct abstractions.  We consider large portions of possible values as a single value and do verification using that single value.  For example, we could use an abstraction to reason about our last example.  What if we said that x, y, and z could be considered to be positive and zero or negative.  If this is the case, we only have to check eight possible inputs:

x y z true/false
≥0 ≥0 ≥0 true
≥0 ≥0 <0 true
≥0 <0 ≥0 true
≥0 <0 <0 true
<0 ≥0 ≥0 false
<0 ≥0 <0 false
<0 <0 ≥0 false
<0 <0 <0 false

We can compute this a couple of different ways, but let’s walk though the way I constructed this table.  Consider the first row.  x, y and z are all positive or equal to zero.  Is a positive number (x) greater than another positive number (y)?  Maybe.  We don’t know, so we will conclude either true or false.  Is a positive number (z) greater than another positive number (x) minus one?  Well a positive number minus one is either a positive or a negative number, so once again we don’t know.  Either true or false.  And lastly is a positive number (x) times five equal to 35?  Once again, we don’t know.  It’s not impossible to construct true using maybe and maybe and maybe, so we must conclude true.  By doing so we have constructed an overapproximation.  We conclude that true is a conclusion more often than it may be in reality.

Let’s try one more example.  Consider the fifth row in the table.  If x is negative and y is positive, then x is definitely not greater than y, so we conclude false.  We don’t have to consider any other cases because false and anything else is always false.  So while we could not prove that our program is safe, we are able to prove under four conditions that it is safe (false is safe because if we execute if(false) 1/0, the 1/0 will not be executed).

If we consider the situation described above where we have code like this:

if(x is even && y is even && z is even) {
  if(x > y && z > (x - 1) && 5*x == 35) {
    x = x/0;
  }
}

We now know that 5*x can never equal 35 because x can only be seven to make that true. Therefore we should conclude that the error is unreachable.  So let’s construct a table like we did above:

x y z true/false
≥0 ≥0 ≥0 true
≥0 ≥0 <0 true
≥0 <0 ≥0 true
≥0 <0 <0 true
<0 ≥0 ≥0 false
<0 ≥0 <0 false
<0 <0 ≥0 false
<0 <0 <0 false

It is exactly the same table.  This is because x is even, y is even and z is even are all maybes and don’t help us to draw any conclusions.  Therefore we would fail to prove this program safe.  When we get to this point we can take two different approaches.  We can pick a new abstraction or not.  If we choose to pick a new abstraction, maybe it will allow us to prove the program.  It also may not allow us to prove the program.  Since there are an infinite number of abstractions to choose, we could go on picking them forever.  Eventually we may choose the second option which is to give up and decide that we just don’t know the result.  We may conclude after running a few examples that the program is probably safe and will ship the code into medical devices, airplanes and cars.  Believe it or not, this is exactly what happens in reality.

In the next post, I will talk about a couple of ways we can pick a new abstraction, so we’re not doing so blindly.  This may allow us to prove many more programs but still know that we will finish.  A big problem with these kinds of problems is what does it mean if the verification doesn’t finish?  Does it mean that the search is just taking a long time or does it mean that it really is unsafe?

Written by arlencox

November 27, 2011 at 3:30 am