Friday, June 22, 2012

The hidden mistake

There are mistakes that drive you crazy when you try to understand what went wrong.

One of the most annoying and hard to catch was this, apparently harmless line:

tungsten-sandbox -m 5.5.24 --topology all-masters -n 2 -p 7300 -l 12300 -r 10300 –t $HOME/mm -d tsb-mm

The person reporting the error told me that the installation directory (indicated by "-t") was not taken into account.

I usually debug by examples, so I copied the line, and pasted it into one of my servers. Sure enough, the application did not take trat option into account. The installation kept happening in the default directory.

I knew that I had done a good job at making the application configurable, but I checked the code nonetheless. The only place where the default directory is mentioned is when the related variable is initialized. Throughout the code, there are no literal values used for this purpose. And yet, the application was not recognizing the option.

I inspected the code while it was running, checking the value of the variable before and after the parsing of the command line. No clues. The option was simply ignored.

I did an experiment, I edited the command line, deleted the "t", and wrote the full name of the option (--tungsten-base) instead of the abbreviated one. I tried again. No changes.

Then I did what I often call IT Voodoo. I removed the option from the middle of the line, and I added it again at the beginning of the line.

tungsten-sandbox -m 5.5.24 -t $HOME/mm --topology all-masters -n 2 -p 7300 -l 12300 -r 10300 -d tsb-mm

And sure enough, the option was accepted!

W. T. F ?

I knew for sure that I did not introduce any logic in the application that would consider an option depending on its position. So I edited the line again, and put the option at the very end.

tungsten-sandbox -m 5.5.24 --topology all-masters -n 2 -p 7300 -l 12300 -r 10300 -d tsb-mm -t $HOME/mm 

And it was accepted again!

Then I edited the line again, putting the option back in the same point where it was initially reported to fail.

tungsten-sandbox -m 5.5.24 --topology all-masters -n 2 -p 7300 -l 12300 -r 10300 -t $HOME/mm -d tsb-mm 

And it succeeded again.

Actually, I didn't manage to make it fail anymore. I hate these cases of "It works for me." I want to find the reason why things fail.

So I got the initial failing line again, and I saved it to a file

$ cat > /tmp/one
tungsten-sandbox -m 5.5.24 --topology all-masters -n 2 -p 7300 -l 12300 -r 10300 –t $HOME/mm -d tsb-mm

$ cat > /tmp/two
tungsten-sandbox -m 5.5.24 --topology all-masters -n 2 -p 7300 -l 12300 -r 10300 -t $HOME/mm -d tsb-mm 

$ vimdiff /tmp/one /tmp/two

Looking at the two strings side by side didn't tell me anything at the beginning, except that the difference starts at the dash. Suddenly, it hit me: it is not a dash, it is a minus sign! It is actually longer than a regular dash, but a cursory examination will miss it. Using the ":asc" feature in vim shows that this is not ASCII code 45, but a more complex character.

So the mystery error was a problem of copy-and-paste. I have seen this error other times, but when I paste the code inside my editor, the syntax highlighter catches it immediately. This time, it took longer to find the culprit, and in the end the story seemed worth sharing.


Anonymous said...

I get the same kind of error when I try and import data from MS Excel sheets using LOAD DATA INFILE.
There's always this weird character imposing as space, which make everything tumble down.

Unknown said...

These kind of errors crop up regularly at our agency. Unprintable whitespace or inconsistent linebreaks are even more difficult to spot. I usually advise taking a look via hexdump -c or a real hex editor. integrating automated checks in the ci server works best, though.

Anonymous said...

Inspsired by this post I posted

Peter Laursen