## Statistical machine translation fails

I know that statistical machine translation (the SMT that does not mean Satisfiability Modulo Theories) is not about being correct all of the time, but sometimes it is really sensitive to odd things that should have nothing to do with the translation. For example, I am learning French and I wanted to offer help to someone (in French). So I thought the sentence:

Would you like me to help?

This is a completely natural sentence, but the grammatical structure is kind of complicated. Instead of “me” being a direct object to “like”, it is part of a compound object “me to help.” The great news is that Google Translate seemingly correctly translates this to French:

Voulez-vous que je vous aide?

Ok, it’s not conversational French per-se, but it is adequate. Literally translated, it reads

Do you want that I help you?

This seems perfectly adequate. However, what if you are typing this into your phone and you are particularly lazy. For example, if you leave off the capitalization and the punctuation, you get instead:

would you like me to help→ vous me voulez aider

This is a rather odd translation. Now “me” is the direct object of want, so this translates to

you want me…to help

That’s kind of annoying. How about the other cases adding the question mark or the capitalization, but not both:

would you like me to help? → voulez-vous que je aider?

This is avery odd translation. Maybe there is some context where this is correct grammar, but my reading of this is essentially

do you want that I to help?

The other case is also kind of wrong:

Would you like me to help → Voulez-vous que pour aider

This also an odd translation. I’m not familiar with the grammatical structure, so I can’t conclude that it is wrong, but I think that it probably is. The “que pour” structure doesn’t seem very common. I believe that this roughly translates to:

Do you want in order to help

The lack of the second party (me) in the sentence is a bit odd.

The take away from this is that Google Translate is very sensitive to punctuation and capitalization. It can completely change your sentence if you leave these out. Of course there are times when it give the wrong translation when you add them, so there is no universally correct answer. The only true answer is to study the language…

## Location tracking in OCaml parsers

I guess that I have been so assimilated into the functional programming world that I was having a very basic issue when developing parsers in OCaml. I had this issue with all of the parser generators (ocamlyacc, Menhir, Dypgen and Aurochs): I could never get a simple error location when there was a parse error. They all raise an exception called something like `Parse_error`

, but they give no location information with that exception.

I found this unbelievable. I mean Antlr gives location information and Parsec does too. What makes OCaml so obtuse?

As it turns out, OCaml is not obtuse; I am. The lexers generated by these tools are stateful. Therefore, when you get a parse error, you can simply query the lexer for the current location after exception has been raised. For example, using Dypgen:

let lexbuf = Dyp.from_channel (StrParse.pp ()) stdin in try (* do parse *) with Dyp.Syntax_error -> let (startpos,endpos) = Dyp.dyplex_lexbuf_position lexbuf in (* print error message - startpos and endpos are of standard Lexing.position type *)

This of course doesn’t provide error recovery where multiple errors could be provided and it doesn’t provide intelligent error messages, but it does give a little bit of information at least on the error. This is what many research tools provide, since developing good error handling is a time consuming task for code that will be quickly disposed of.

A very similar process can be followed both ocamlyacc and Menhir to good effect, though now that I have discovered the power of Dypgen, I doubt I will go back.

## Unofficial Laws

## Chad’s “Constant”

As an undergraduate I had a friend named Chad who invented his own named “constant” that he called Chad’s “Constant”. This particular constant is of use to any engineering or physics student where a numerical answer to a problem is desired. Chad’s “Constant” is defined as the correct answer to a problem minus my answer to the problem. Consider this example:

Problem is equal to some correct answer . However, I compute (possibly incorrectly) that:

To ensure that I get full credit for my assignment, I can always add Chad’s “Constant”, in order to get the right answer:

## Arlen’s Law of Performance

If I add this as the last step to any problem, I can never fail.

In a similar vein, I will define Arlen’s Law of Performance, which applies to software development. It is this:

**If there is any doubt that performance will suck, performance will suck.**

It’s not quite Amdahl’s Law as there is no mathematics backing this up. This is purely the observation that probabilistically speaking you are more likely to have bad performance than good. If heuristics are used to accelerate a process there is always a care where they fail and that case will likely occur often. SAT is an exception to this rule, but it has been carefully studied and a tremendous amount of effort has been put into making performance not suck most of the time.

## Picking an Abstraction

In my last post I discussed how we use abstractions such as the sign extraction to limit our search to a small, finite space. So let’s talk about how to improve those abstractions so that we can have more precision and hopefully prove our property.

Let’s consider this problem from last time:

if(x is even && y is even && z is even) { if(x > y && z > (x - 1) && 5*x == 35) { x = x/0; } }

We discussed last time that this program will not fail because of the combination of x is even and 5*x = 35. We allowed ourselves to choose between two possible predicates: and , where is a variable. By doing so, we constructed this table, which showed that even though this program won’t fail, we couldn’t prove it selecting from those two predicates.

x | y | z | true/false |
---|---|---|---|

≥0 | ≥0 | ≥0 | true |

≥0 | ≥0 | <0 | true |

≥0 | <0 | ≥0 | true |

≥0 | <0 | <0 | true |

<0 | ≥0 | ≥0 | false |

<0 | ≥0 | <0 | false |

<0 | <0 | ≥0 | false |

<0 | <0 | <0 | false |

So what if we added more predicates? If we added more choices it would make the search harder, but it might make it possible to prove this program safe (prove that the error is unreachable). So let’s add another pair of predicate options: and . The % operator is modulus. It gives us the remainder after division. If the remainder is zero when dividing by two, it means that the number we divided by two is even. If the remainder is one, that means that the number we divided by two is odd.

Since we now how more options on which predicates to choose, let’s specify how we can combine them. For simplicity we will say that we can only combine them conjunctively (and, written ). So we could say that , but we could not write that either or .

Now that we have added predicates, what does this mean for our table? Since we six choices for each variable: , , , , , and , and we have three variables, that means we have or 216 rows in our table. Additionally reading the table is no longer straightforward. We cannot simply look for true/false on the right because we have to make a choice. Let’s not do it this way. It sounds hard and complicated. Let’s look back at the problem to decide how to solve it.

One approach would be to syntactically look at the formula and see if anything matches nicely. Since we have x is even, y is even and z is even, this does match nicely with . If we use this constraint, we see that we can no longer find any true values, so we have proved the program. Hooray!

## Searching Smarter

The problem is that this won’t work in more general cases. With a large number of possible predicates and a large number of variables, the size of the problem will grow tremendously and quickly become practically impossible to solve. So what we can do is use a method called predicate abstraction.

Predicate abstraction uses a loop. We start by picking simple predicates, like the sign predicates. We then attempt to verify the system, like we did above. If the verification succeeds (concludes that the program is Safe) we know that the program is actually safe because the abstraction represented an overapproximation of the behavior of the system. We allowed more behaviors than the program, therefore if it doesn’t go wrong in any of those behaviors, it won’t go wrong in the program.

If verification finds that the program is unsafe (such as a true result in the table above), we can generate a concrete counterexample. We essentially pick random values that meet the constraints of a failing case. We could choose row 1 in the table, so we need to pick a values for x, y and z that are greater than zero. For example, we might pick x = 3, y = 4, z = 0. We then run the actual program on those values to see if we find a bug.

Since x = 3 is not even, we execute the first if and thus cannot reach the error location. This checking of the concrete counterexample in this case will reveal that this particular counterexample is impossible. If it weren’t impossible, it would be a real bug and would indicate a completely failed proof. In this case it is impossible and thus we must select new predicates that would exclude this counterexample.

We select new predicates by looking at why we could not reach the bug. We could not in this case because x = 3 is not even. Therefore we’ll add a new constraint that x must be even or . Now that we have done this, we can check the system again:

x | y | z | true/false |
---|---|---|---|

≥0 | ≥0 | false | |

≥0 | <0 | false | |

<0 | ≥0 | false | |

<0 | <0 | false | |

<0 | ≥0 | ≥0 | false |

<0 | ≥0 | <0 | false |

<0 | <0 | ≥0 | false |

<0 | <0 | <0 | false |

Now we can conclude that the program is safe because the combination of cannot be fulfilled. Specifically an even number times an odd number is always an even number and 35 is an odd number.

This technique is far from perfect. We have to have an established set of predicates to use before we begin and unless we know what predicates to pick, verification will fail often because it ran out of predicates. I was able to select x is even as a predicate, which was quite important for proving this program safe. That said, for many programs this is an adequate way of performing proofs.

In my next post I will discuss verification of more realistic programs involving loops and functions. I will also introduce abstract interpretation, which is a different method for doing program proofs. Eventually we’ll get into what it is that I do. This technique presented here was first applied automatically to software in 2001.

## Software Verification

One major problem with pursuing a Ph.D. in computer science and particularly with verification and formal methods is that it is exceedingly difficult to describe to others what it is that I do. In this post I hope to (somewhat pedantically) explain what software verification is. At the very least this should serve as a framework for discussion about what I do.

## What is software?

We all use software, so we all know that Microsoft Word, AutoCAD, Halo and Firefox are all pieces of software. Software is much more than that, however. It’s vitally important to understand that software exists everywhere. This is because computers are everywhere. Microwaves have computers and thus have software. Cars have tens to hundreds of computers in them. Each of those computers runs software. Medical devices, satellites, airplanes, cameras and even some bicycles have computers and thus software.

I have said many times over the few years of teaching that I don’t care what kind of engineer you are, you will end up writing software sometime in your career. Likely writing software will be a major part of your career. A good example of this is Excel. Most people end up using Excel or some spreadsheet program sometime in their career. People start out by entering a bunch of data into a bunch of cells. Pretty soon people realize that they want to perform some repetitive operation. For example, if you’re managing a bank account you might split deposits and withdrawals into two different columns, but now to maintain the total you need to sum these up. Rather than pulling out your calculator, I hope you use Excel’s features to do this for you:

By typing ‘=-$A2+$B2’ into a cell in column C, you have programmed Excel to add cell A2 times negative one to cell B2 and place the result (in this case) in cell C2. This is definitely preferable to using a calculator to fill in this column because if I made a typo in the value I placed in cell A2, I can fix that error and cell C2 will be automatically updated. Additionally, instead of punching each row of values into your calculator, you can copy and paste that formula in cell C2 into C3 and it will apply the same operation, only on cells A3, B3 and C3. We have saved ourselves time and reduced the number of places we can make mistakes. If we get the formula in column C correct and our data in columns A and B are correct, we know that unless Excel is broken (which is always possible), the results in C are correct.

While this is a quite simple piece of software, this is still software. Essentially software is a way of instructing a computer to do something for us. This software eliminates repetition. We didn’t have to redo the calculation in each row. It performs mathematics automatically so that we don’t have to test our abilities with a slide rule. Most importantly it prevented us from making additional mistakes. We provided a general method to compute the total for each row and asked Excel to make use of that method.

### Traditional Software

Traditionally software is written in a more text-based language. There are literally thousands of different languages, but most people have at least heard of one or two of them. Fortran was (and still is to some extent) a popular language for scientific computing. C is a popular language for writing operating systems (like Windows, Linux or OS X). Java is a popular language for writing applications that have graphics or run inside web pages. Javascript (ECMAscript, technically) is a language that runs inside all web browsers and powers websites like GMail or Facebook.

A really simple program might look like this:

int running_sum(int n) { int i = 0; int sum = 0; while(i < n) { sum = sum + i; i = i + 1; } return sum; }

Complete understanding of this code is not required, but lets look at some of the basics. This defines a function called running_sum. While we may call it a function (something that has mathematical meaning) what we really mean is that this is a routine. We’ve called it running_sum because that’s what it does. The name is of no significance to a computer. A computer simply knows that it is a routine. We might as well call it gobledygook for all the computer cares.

The computer doesn’t care what the function is called because we tell the computer what the function actually does. It takes an integer n as input. We then make two more integers i and sum. Then as long as i is less than n we perform two operations (contained between the curly braces { and }) where we add to sum the value of i and then we add one to i. Once all that is complete we state that the result of the routine is sum.

What does this do? If we give it an input of 4, it will start with i = 0 and sum = 0. It will then check and see that i, which is equal to 0 is less than n, which is equal to 4 so we must perform those two operations. So we do so: sum=0 (sum=0+0) and i=1 (i=0+1). Then we check again if i is less than n. One is less than four, so we perform the two operations: sum=1 (sum=0+1) and i=2 (i=1+1). Once again we check if i is less than n. Once again we perform the two operations: sum=3 (sum=1+2) and i=3 (i=2+1). We check if i is less than n and it is because 3 is less than 4, so we perform the two operations: sum=6 (sum=3+3) and i=4 (i=3+1). We finally check if 4 is less than 4. It is not, so we no longer perform those two operations. The only operation left is to return sum, which returns the result of 6.

This result could be used by a program like Excel to fill in a data table. Alternatively it could be used to show a picture on the screen or two guide a rocket to Mars or to deploy an airbag in a car in the event of a crash. We use software, however it may be written, to make our lives easier or to solve problems that humans may not be able to solve.

## Software Verification

By software verification, we usually mean verifying the correctness of the software. This opens an interesting question. What does it mean to be correct? What should running_sum be?

One way of addressing this would be to implement the program twice. Or maybe to have two different people implement the program. Say that another person implements the program this way:

int running_sum(int n) { return (n*n)/2 - n/2; }

Is this function the same as the one above? Equivalence checking is one form of verification. The problem with it is how do we know either of the functions is correct? The what does it mean to be correct question still holds. Interestingly enough, these two programs are not the same. Equivalence checking would fail. Can you find an input where the two programs differ?

An alternative question is, is the function error free? Because an error is commonly defined as a program crash, we can ask if this program can crash. There are actually very few ways in which a program can crash. Typically these are either a segmentation fault or a divide by zero error. Since we haven’t talked about memory and won’t for a while, I’ll leave segmentation faults for later. Can our program divide by zero?

This answer is obviously no, since the first implementation has no division at all and the second only divides by two. In general this can be a difficult question. It is definitely not a question that is easily resolved by trying a few random inputs, unless there actually happens to be a division by a constant zero in the code. We can turn any constraint into a divide-by-zero check:

if(x > y && z > (x - 1) && 5*x == 35) { x = x/0; }

Note that && is a logical and operation. In this case, if x is greater than y and z is greater than x minus one and five times x is equal to 35. If this is true, we have a guaranteed divide by zero. So now we have to answer the question, is there any way to make that statement true. In this particular case it’s not terribly difficult. x has to equal seven. z can be anything greater than or equal to seven and y can be anything less than or equal to six. This may be much more difficult than just looking at this code makes it out to be. There may be other constraints on x or y. What if the only way we could reach this code was if x was an even number. Then there is no way to reach the divide by zero.

In general we can write a function called assert that claims a fact must be true or the program will crash. We can write assert like this (in theory anyway, modern systems prevent this):

void assert(bool fact) { if(fact) { 1/0; } }

Programmers commonly use this assert function, which is built in to most languages to make claims about their software. The most general way to look for problems is to ensure that no asserts crash. We can also use this as a means to verify programs. If programmers claim what the results of their functions should look like (mathematically) and we can check assertions, then we can verify software.

### Abstraction

The actual process for verifying software is quite involved. In software, most languages are what’s known as undecidable. What this means is that it is actually impossible to always find if there is no way to violate an assertion. What this means in practice is that usually the search to find if there is an input that will kill the program is infinitely long. What we do instead is we construct abstractions. We consider large portions of possible values as a single value and do verification using that single value. For example, we could use an abstraction to reason about our last example. What if we said that x, y, and z could be considered to be positive and zero or negative. If this is the case, we only have to check eight possible inputs:

x | y | z | true/false |
---|---|---|---|

≥0 | ≥0 | ≥0 | true |

≥0 | ≥0 | <0 | true |

≥0 | <0 | ≥0 | true |

≥0 | <0 | <0 | true |

<0 | ≥0 | ≥0 | false |

<0 | ≥0 | <0 | false |

<0 | <0 | ≥0 | false |

<0 | <0 | <0 | false |

We can compute this a couple of different ways, but let’s walk though the way I constructed this table. Consider the first row. x, y and z are all positive or equal to zero. Is a positive number (x) greater than another positive number (y)? Maybe. We don’t know, so we will conclude either true or false. Is a positive number (z) greater than another positive number (x) minus one? Well a positive number minus one is either a positive or a negative number, so once again we don’t know. Either true or false. And lastly is a positive number (x) times five equal to 35? Once again, we don’t know. It’s not impossible to construct true using maybe and maybe and maybe, so we must conclude true. By doing so we have constructed an overapproximation. We conclude that true is a conclusion more often than it may be in reality.

Let’s try one more example. Consider the fifth row in the table. If x is negative and y is positive, then x is definitely not greater than y, so we conclude false. We don’t have to consider any other cases because false and anything else is always false. So while we could not prove that our program is safe, we are able to prove under four conditions that it is safe (false is safe because if we execute if(false) 1/0, the 1/0 will not be executed).

If we consider the situation described above where we have code like this:

if(x is even && y is even && z is even) { if(x > y && z > (x - 1) && 5*x == 35) { x = x/0; } }

We now know that 5*x can never equal 35 because x can only be seven to make that true. Therefore we should conclude that the error is unreachable. So let’s construct a table like we did above:

x | y | z | true/false |
---|---|---|---|

≥0 | ≥0 | ≥0 | true |

≥0 | ≥0 | <0 | true |

≥0 | <0 | ≥0 | true |

≥0 | <0 | <0 | true |

<0 | ≥0 | ≥0 | false |

<0 | ≥0 | <0 | false |

<0 | <0 | ≥0 | false |

<0 | <0 | <0 | false |

It is exactly the same table. This is because x is even, y is even and z is even are all maybes and don’t help us to draw any conclusions. Therefore we would fail to prove this program safe. When we get to this point we can take two different approaches. We can pick a new abstraction or not. If we choose to pick a new abstraction, maybe it will allow us to prove the program. It also may not allow us to prove the program. Since there are an infinite number of abstractions to choose, we could go on picking them forever. Eventually we may choose the second option which is to give up and decide that we just don’t know the result. We may conclude after running a few examples that the program is probably safe and will ship the code into medical devices, airplanes and cars. Believe it or not, this is exactly what happens in reality.

In the next post, I will talk about a couple of ways we can pick a new abstraction, so we’re not doing so blindly. This may allow us to prove many more programs but still know that we will finish. A big problem with these kinds of problems is what does it mean if the verification doesn’t finish? Does it mean that the search is just taking a long time or does it mean that it really is unsafe?

## Problems with Papers

Writing papers for peer-reviewed journals is not a trivial task. I’ve written three of them now and none of them have yet been accepted, though hopefully the third and most recent will. The difficulty with preparing these papers is twofold:

- Implementing the idea
- Evaluating the idea

## Color Spaces and the Web

Gack! I am working on creating a new website for myself at school. Since I do a lot of photography these days, I thought I would put up a page about my photography. Naturally I had my normal problems with HTML and CSS. Nothing is getting centered correctly. Alignment is odd and text never flows the way I want it to. Oh well. What I didn’t expect is to have problems with my photographs.

As I was proofreading my page, I realized that what I was saying about some of my photographs didn’t really correspond with the photographs. I was saying that I really liked the color in some of the pictures and when I looked at the picture, I didn’t like the color. In fact it looked nothing like I remembered. I opened up iPhoto and looked at the original. It looked great. So what went wrong?

It turns out color spaces matter. Check out this picture:

This is the exact same picture shown in two different web browsers. The one on the left is shown in Google Chrome (my new web browser of choice). The on on the right is shown in Apple Safari (my previous web browser of choice). Whether you like it or not, I intended for the picture to look like the one on the right. So why did this happen?

It turns out that the picture has an embedded color profile. This tells the computer displaying the picture what the color values actually mean. In this case, the picture uses the Adobe RGB space, which is larger space than Standard RGB (sRGB). When a browser that ignores color profiles renders the picture, it interprets the colors specified in Adobe RGB as sRGB colors. This results in the muted colors seen in the left picture.

There are several solutions to this problem:

- If you are shooting with a point and shoot camera, you probably won’t have this problem. Most cameras use the sRGB color space by default and photo editing software will respect this.
- If you are shooting with a DSLR, you could always shoot in JPEG mode. This will give you the same benefits as a point and shoot.
- If you’re like me and want the extra dynamic range offered by RAW you could try editing using programs other than iPhoto. I know Aperture can be set to export to sRGB regardless of what color space you edit it.
- If you you’re even more like me and you don’t want to buy software, you could use ColorSync, which comes with every Mac to change the color space of your picture. It’s very easy to set up an automator action to convert any JPEG image from Adobe RGB to sRGB. It’s a quite fast operation.

The real problem is for people viewing my pictures. I’m not particularly eager to go back and update all of my pictures to fix the color profile. So which browsers can actually view my pictures correctly?

Firefox 3 (Win/Mac/Linux) |
Works! (Supports ICC v2 Profiles) |

Firefox 4 (Win/Mac/Linux) |
Works! (Supports ICC v2 Profiles) |

Chrome 10 (Win/Mac/Linux) |
Does not work! |

Safari 5 (Win/Mac) |
Works! (Supports ICC v2 and v4 Profiles) |

Internet Explorer 7 (Win) |
Does not work! |

Internet Exporer 9 (Win) |
Works (sort of) (Supports ICC v2. v4 seems broken) |

I recommend that if you’re using chrome to view my pictures you try looking at them again in a supported browser. Use this website to test your browser for compatibility.