The Method in Action

In the previous post I presented some ideas about the use of scientific method in problem resolution. I thought about a situation where it would be applied, and then I remember about an old problem I faced once when I was a support engineer. The claim was that SQL Server had a bug on real numbers. Please see below how would be the approach using the scientific methos.

Problem description: SQL Server 2000 has a bug when manipulating real numbers.

Define

The statment claims that there is a bug manipulating real numbers. Exploring the problem by asking questions to the customer, we find out that the claim is specific and reproduceable by the steps below.

1) Create a table.

Create Table [Table] (valor numeric (8,2))

2) Insert data.

Insert into [Table] values (11145.83)

Insert into [Table] values (-4166.67)

Insert into [Table] values (-6970.00)

3) Query the table

SELECT SUM (VALOR) FROM TABLE

the result is result 9.16.

4) Convert to real.

declare @ws_valor real

set @ws_valor = (select sum (valor) from table)

select @ws_valor

The result is 9.1599998.

This is the problem being claimed by the statement above. It is clearly defined then: when making a sum of values 11145.83, -4166.67 and -6970.00 and then exhibiting the result as a real number, there is a error introduced. The claim is that this is a bug in the product.

Formulate Hypothesis

The claim itself is one hypothesis: the error introducted is a bug in the product; let's consider this hypothesis A. Careful and critical thinking, plus previous education on the subject will allow us to establish other possible hypothesis:

  • It is not a bug in the product; something else is causing this error. (hypothesis B)
  • The error is actually unvoidable given the way digital computers work. (hypothesis C)

These two hypothesis are not mutually exclusive, but they are exclusive when compared to the claim's hypothesis. If A is true, then both B and C are false. If B or C are true then A is false.

Gather data

We need to define a data collection plan that either confirm or rejects the hypothesis above. Hypothesis A establishes that there is a bug in SQL Server, but if we can reproduce the problem also in other applications, then hypothesis A would be rejected. We know by education that real numbers are stored inside digital cumputers using floating point notation. Data about how this is accomplished may help to prove or disprove hypothesis B and C. Based on this the data collection plan is built as:

  1. Test other applications with similar operations. Check if the problem is also reproduceable.
  2. Try to write sample code in a computer language to reproduce the problem. Collect results.
  3. Research for information on how floating point numbers are stored in digital computers.
  4. Research about definition of data type REAL in SQL Server Books Online.

The data collection produces these data:

  1. The problem happens the same way in Excel if the three values are added up and the sum is exhibited with more than 13 decimal places.

  2. Sample code written in C and Visual Basic show the same behavior.

    1. Visual Basic

      - Create a new "Standard EXE" project.
      - Paste the code below on Form_Load event:

      Private Sub Form_Load()
      Dim Valor As Double
      Valor = 11145.83 - 4166.67 - 6970
      Debug.Print Valor
      End Sub

      - Run the program. Check on the "Immediate Window" the number printed: 9.15999999999985.

    2. C

      int main(int argc, char* argv[])
      {
      double fp;
      fp = 11145.83 - 4166.67 - 6970;
      printf("Testing floating point precision.\n");
      printf("Double, 2 digits after the point: %.2f \n",fp);
      printf("Double, 6 digits after the point: %.6f \n",fp);
      printf("Double, 10 digits after the point: %.10f \n",fp);
      printf("Double, 13 digits after the point: %.13f \n",fp);
      printf("Double, 14 digits after the point: %.14f \n",fp);
      printf("Double, 16 digits after the point: %.16f \n",fp);
      return 0;
      }

  3. Researching on "floating point" operations in digital computers we find the article "Floating Point" in Wikipedia.Org, and also a reference to paper "What Every Computer Scientist Should Know About Floating-Point Arithmetic," by David Goldberg, published in the March, 1991 issue of Computing Surveys.

  4. SQL Server Books Online has a topic about REAL data type.

Analyze Data

  1. The problem happens also in Excel, meaning that if hypothesis A is confirmed then it would also apply to Excel.
  2. The problem also happens when running sample code in VB and C, again that if hypothesis A is confirmed then it would be confirmed for any program written in these languages.
  3. The information found in the article researched states that "a floating-point calculation is an arithmetic calculation done with floating-point numbers and often involves some approximation or rounding because the result of an operation may not be exactly representable." The article goes on and explains that "there are many cases where floating-point numbers do not model real numbers well, even in simple cases such as representing the decimal fraction 0.1, which cannot be exactly represented in any binary floating-point format. For this reason, financial software tends not to use a binary floating-point number representation." (Wikipedia, 2005). Also, Goldberg states that "given any fixed number of bits, most calculations with real numbers will produce quantities that cannot be exactly represented using that many bits. Therefore the result of a floating-point calculation must often be rounded in order to fit back into its finite representation. This rounding error is the characteristic feature of floating-point computation." (1991, Appendix D, paragraph 8). This information rules confirms hypothesis B in the sense that "something else is causing the error", but it also confirms hypothesis C, given the fact that the error is inherent to digital computers.
  4. SQL Server BOL states that "approximate number data types for use with floating point numeric data. Floating point data is approximate; not all values in the data type range can be precisely represented." This also leads to confirm hypothesis B and C.

Report Findings

By confirming hypothesis B and C, we automatically reject hypothesis A.

In SQL Server, the data types REAL and FLOAT are approximate types, therefore they will never be exact when handling fractions. This is the behavior that is expected from the product, and from all products that work with fractional numbers on digital computers. The same applies to any application. If it gets a data from SQL Server and stores it on a variable that is REAL, DOUBLE or FLOAT or any other that uses floating point, an approximate number will be stored if it is a fractional number. If the application was built in VB, for instance, and the variable is Double, this is the behavior which is expected. In this situation, another datatype should be used, like Currency, that doesn't work on floating point.

From a mathematical standpoint, "approximate value" means that an error was introduced. This is a feature of floating point notation, which is built-in in digital computers. There is no way to 'fix' this. On SQL Query Analyzer specifically, it is exhibiting the data without formatting, that is the reason why it shows 9.15999999999985 instead of 9.16.

The REAL (double) and FLOAT datatype are stored in floating point, and therefore not precise. This is by design. When exhibiting these data, the correct conversion must be performed by the application. This is not a SQL Server feature, but is buit-in on floating-point numbers in binary digital computers.

_________________________________________________

References

Floating Point (2005). Wikipedia.Org. Retrieved online from https://en.wikipedia.org/wiki/Floating_point.

Goldberg, David (1991). What Every Computer Scientist Should Know About Floating-Point Arithmetic. Computer Survey, issued March 1991, Appendix D, Paragraph 8. Retrieved online from https://docs.sun.com/source/806-3568/ncg_goldberg.html.