Foxpro Performance tip: field name lookup for tables

When FoxPro opens or uses a table or cursor, internally we have to keep track of the field names used in that table.

For example, this statement creates a table with 3 fields.

CREATE TABLE foo (lastname c(10), firstname c(10),address c(20))

When encountering a line of code:

            x = lastname + firstname

Fox needs to figure out if “lastname” is a field name in the currently opened table or a variable.

All names used internally in Foxpro are stored in the “name table”, with a single integer index into the table representing the name. In fact, they’re called NTIs, for Name Table Indexes.

A name could be a user variable name, a property/method name, or even native property/method names, like ActiveProject. Command and function names are not put in the name table.

When a table is used (opened or created), VFP creates an array of structures for each field. A member of the structure is the NTI for the field name. VFP searches the name table for the field names and an index is returned. If they don’t exist, then they’re added.

In the old days (prior to FoxPro 7.0), when a table was opened, an array was allocated of size (maximum name table index ever used). All the array elements were set to 0, then a loop through all the fields would set the nth array element to n+1 (the field # for the nth field). Thus if the NTIs for “lastname”, “firstname”, and “address” were 200, 210, and 220, then the 200th element of the array would be 1, the 210th element would be 2 and the 220th element would be 3. To be a little more efficient, the array was then resized to be the maximum NTI found.

This worked fine and the time to lookup a field name was constant: if you want to know if NTI 200 was in the table, just check to see if the 200th element was non-zero: that would indicate the field position in the table.

If the name table had 3000 names in it, then the lookup array would have 3000 elements, all of which were zero except for the # of fields (FCOUNT()) elements.

As new features were added to VFP for subsequent versions, the default name table size grew because the # of properties and baseclasses grew.

If you had a couple dozen tables open, with perhaps a few dozen fields per table, with fairly complex code running (which would add to the name table), and then opened another table, the growth of the name table meant the lookup array size grew quite a bit too.

To maximize performance, the number of disk accesses must be minimized. Disk access is orders of magnitude slower than memory access, so it’s good to use memory wisely.

I changed this lookup array algorithm for VFP7 (June 2001) and later so it doesn’t waste all this memory with the sparse lookup array. This is the classical memory space vs execution time trade-off.

Instead of creating a lookup array, the array of field names is searched linearly for the given NTI. Thus searching for a field name using tables with lots of fields is slower, but there is no longer a huge sparse lookup array and precious memory can be used for other tasks.

The search is just a linear search through the fields as can be seen by experimenting with the code below.

A table with 255 fields is created, and a variable “x” is assigned in a loop. With the table closed, or with a small number of fields, the loop occurs quite quickly. With many fields, accessing fields near the end takes a lot longer. Determining if an NTI is not a field means looping through the entire list of fields in the current workarea.

If the NTI is not a field (like “y” below), the linear search occurs through to the end of the field list to determine that it’s not a field, so performance is slow. An easy way to speed it up (and a way to make code easier to read, easier to maintain, and faster) is to prefix the variable name with “m.” which specifies that the name is *not* a field name, but a variable. This boosts performance, even if there is a table with many fields open.

Restated: if you have VFP code that accesses a variable with a syntax that *could* be a field reference, and there is a possibility that a cursor is open in the current workarea as well, preface that variable with “m.” for better performance. VFP then “knows” that it can’t be a field name.

A 2 gigahertz processor can execute a loop of 255 integer compares with an array of structure members very quickly, but if that loop can be eliminated with a minor code change, then why not?

*experiment with 2 or 255 fields

#define NFLDS 255 &&Create a table with 255 fields

DIMENSION aflds[NFLDS,5]

FOR i = 1 TO NFLDS

      aflds[i,1]="fld"+PADL(i,3,"0")

      aflds[i,2]="i"

      aflds[i,3]=4

      aflds[i,4]=0

ENDFOR

CREATE CURSOR test FROM ARRAY aflds

INSERT INTO test (fld001) VALUES (0)

use && Comment this line to experiment with the table closed or open

y=1

nsec=SECONDS()

FOR i = 1 to 1000000

      x=y

* x=m.y

* x=fld255

* x=test.fld255

ENDFOR

?SECONDS()-nsec

RETURN

The original bug report (must run it in VFP6 to reproduce)

*Steps to Repro:

CLEAR all

CLEAR

memst=VAL(SYS(1016))

?SYS(1011),SYS(1016)

FOR i = 1 to 1000

     

      mstr='e'+TRANSFORM(INT(SECONDS()*1000))

      CREATE CURSOR foo ((mstr) c(10))

      ?SYS(1011),SYS(1016), VAL(SYS(1016)) - memst

      use in foo

ENDFOR

RETURN

Observed:

the numbers in the handle column are relatively constant, but the size of the handles in the 2nd column keep increasing.

Expected:

not so much mem to be used.

45914