Arlen Cox

Ramblings on my hobbies

Picking an Abstraction

In my last post I discussed how we use abstractions such as the sign extraction to limit our search to a small, finite space.  So let’s talk about how to improve those abstractions so that we can have more precision and hopefully prove our property.

Let’s consider this problem from last time:

```if(x is even && y is even && z is even) {
if(x > y && z > (x - 1) && 5*x == 35) {
x = x/0;
}
}
```

We discussed last time that this program will not fail because of the combination of x is even and 5*x = 35.  We allowed ourselves to choose between two possible predicates: $v \geq 0$ and $v < 0$, where $v$ is a variable.  By doing so, we constructed this table, which showed that even though this program won’t fail, we couldn’t prove it selecting from those two predicates.

x y z true/false
≥0 ≥0 ≥0 true
≥0 ≥0 <0 true
≥0 <0 ≥0 true
≥0 <0 <0 true
<0 ≥0 ≥0 false
<0 ≥0 <0 false
<0 <0 ≥0 false
<0 <0 <0 false

So what if we added more predicates?  If we added more choices it would make the search harder, but it might make it possible to prove this program safe (prove that the error is unreachable).  So let’s add another pair of predicate options: $v \% 2 = 0$ and $v \% 2 = 1$.  The % operator is modulus.  It gives us the remainder after division.  If the remainder is zero when dividing by two, it means that the number we divided by two is even.  If the remainder is one, that means that the number we divided by two is odd.

Since we now how more options on which predicates to choose, let’s specify how we can combine them.  For simplicity we will say that we can only combine them conjunctively (and, written $\wedge$).  So we could say that $y \geq 0 \wedge y \% 2 = 0$, but we could not write that either $y \geq 0$ or $y \% 2 = 0$.

Now that we have added predicates, what does this mean for our table?  Since we six choices for each variable: $\geq 0$, $< 0$, $\geq 0 \wedge \% 2 = 0$, $< 0 \wedge \% 2 = 0$, $\geq 0 \wedge \% 2 = 1$, and $< 0 \wedge \% 2 = 1$, and we have three variables, that means we have $6^3$ or 216 rows in our table.  Additionally reading the table is no longer straightforward.  We cannot simply look for true/false on the right because we have to make a choice.  Let’s not do it this way.  It sounds hard and complicated.  Let’s look back at the problem to decide how to solve it.

One approach would be to syntactically look at the formula and see if anything matches nicely.  Since we have x is even, y is even and z is even, this does match nicely with $x \% 2 = 0 \wedge y \% 2 = 0 \wedge z \% 2 = 0$.  If we use this constraint, we see that we can no longer find any true values, so we have proved the program.  Hooray!

Searching Smarter

The problem is that this won’t work in more general cases.  With a large number of possible predicates and a large number of variables, the size of the problem will grow tremendously and quickly become practically impossible to solve.  So what we can do is use a method called predicate abstraction.

Predicate abstraction uses a loop.  We start by picking simple predicates, like the sign predicates.  We then attempt to verify the system, like we did above.  If the verification succeeds (concludes that the program is Safe) we know that the program is actually safe because the abstraction represented an overapproximation of the behavior of the system.  We allowed more behaviors than the program, therefore if it doesn’t go wrong in any of those behaviors, it won’t go wrong in the program.

If verification finds that the program is unsafe (such as a true result in the table above), we can generate a concrete counterexample.  We essentially pick random values that meet the constraints of a failing case.  We could choose row 1 in the table, so we need to pick a values for x, y and z that are greater than zero.  For example, we might pick x = 3, y = 4, z = 0.  We then run the actual program on those values to see if we find a bug.

Since x = 3 is not even, we execute the first if and thus cannot reach the error location.  This checking of the concrete counterexample in this case will reveal that this particular counterexample is impossible.  If it weren’t impossible, it would be a real bug and would indicate a completely failed proof.  In this case it is impossible and thus we must select new predicates that would exclude this counterexample.

We select new predicates by looking at why we could not reach the bug.  We could not in this case because x = 3 is not even.  Therefore we’ll add a new constraint that x must be even or $x \% 2 = 0$.  Now that we have done this, we can check the system again:

x y z true/false
$x \geq 0 \wedge x \% 2 = 0$ ≥0 ≥0 false
$x \geq 0 \wedge x \% 2 = 0$ ≥0 <0 false
$x \geq 0 \wedge x \% 2 = 0$ <0 ≥0 false
$x \geq 0 \wedge x \% 2 = 0$ <0 <0 false
<0 ≥0 ≥0 false
<0 ≥0 <0 false
<0 <0 ≥0 false
<0 <0 <0 false

Now we can conclude that the program is safe because the combination of $x \% 2 = 0 \wedge 5x = 35$ cannot be fulfilled.  Specifically an even number times an odd number is always an even number and 35 is an odd number.

This technique is far from perfect.  We have to have an established set of predicates to use before we begin and unless we know what predicates to pick, verification will fail often because it ran out of predicates.  I was able to select x is even as a predicate, which was quite important for proving this program safe.  That said, for many programs this is an adequate way of performing proofs.

In my next post I will discuss verification of more realistic programs involving loops and functions.  I will also introduce abstract interpretation, which is a different method for doing program proofs.  Eventually we’ll get into what it is that I do.  This technique presented here was first applied automatically to software in 2001.

Written by arlencox

December 4, 2011 at 8:02 am

3 Responses

1. […] the next post, I will talk about a couple of ways we can pick a new abstraction, so we’re not doing so […]

2. Let name choices by number:
1. x >= 0
2. x >= 0 ^ x % 2 = 0
3. x >= 0 ^ x % 2 = 1

This choices has depends. If (2) is true and (3) is true, (1) must be true. If (2) is false or (3) is false (1) must be false.
(1) = (2) and (3)
(1) is unnecessary.

As a result, quantity of choices is 4, not 6. 4^3 rows of table.

Or it ‘s a just example?

Predicate abstraction is well-known method. Could you recommend some publications of effect using of this method?

bsivko

December 5, 2011 at 8:16 pm

• Your combinatorics are correct. My math is based on the the original example running as well, but you are correct that it is redundant. Thanks for pointing this out.

The best examples of predicate abstraction in practice are the SLAM and BLAST tools. SLAM was developed at Microsoft by Tom Ball, Rupak Majumdar, Todd Millstein and Sriram Rajamani. Their 2001 PLDI paper just won the most influential paper award. The paper is Automatic predicate abstraction of C programs

Thomas Henzinger, Ranjit Jhala, Rupak Majumdar and Grégoire Sutre published a paper titled Lazy Abstraction that discusses the algorithm behind the BLAST tool, which was UC Berkeley’s SLAM equivalent.

Predicate abstraction appears in (slightly) more recent work, such as the Liquid Types work by Patrick Rondon, Ming Kawaguchi and Ranjit Jhala at UC San Diego. Liquid types recasts the verification problem as a dependent type checking problem (through the “famous” Curry-Howard isomorphism). It also copes with higher order languages as Liquid Types applies to OCaml.

arlencox

December 6, 2011 at 3:27 pm