My co-workers and I recently came across a piece of code which exposed some assumptions we had about the “correct” behavior of two functions; these assumptions turned out to be false. The code dealt with determining if the IP of a request coming into to a website matches a certain range of IP address. The range of IP address was defined in a config file and could look something like this:
220.127.116.11, 156.*.*.*, 127.0.0.*
Where the * acts as a wild card and each comma-separated IP format is matched against the incoming IP address. The code’s intended behavior is that specifying no IP address formats in the config would mean that no IP addresses would match the format. We noticed that this wasn’t holding true. We went to examine the code that was performing the matching (which none of us had written and sadly has NO test cases around it). The code boiled down to something like this:
Where ipFormatsFromConfig is the ip range of formats shown above separated by “,” or an empty string if no formats are given. On first glance we all were confused by why this didn’t work and then as we looked into it we realized we had made some incorrect assumptions.
First, we assumed that “”.Split(‘,’) would return a zero element array. This assumption is false. It returns a 1 element array containing the empty string.
After we learned this we encountered our second assumption. We assumed that a regular expression whose pattern was an empty string would not match anything. (ConvertFormatIntoRegex returned just that). This assumption is also false. After learning the true behavior, it makes sense why it behaves this way. (A DFA with only one state must accept every input. )
Our assumptions about the “correct” behavior of functions reflect similar mistakes made by all programmers; assuming that you don’t make these kinds of assumptions is a very dangerous way to program and reinforces the importance of unit testing.