Index creation performance issues

Article
01/31/2006

In this post: Index creation performance question I asked: Is it faster to fill a table with values, then index it or create the index on an empty table, then fill it? Why?

After a record in a cursor is modified, any associated index must be updated. If there is already an index, then after each record added, the index needs to be checked.

If the index is created on the filled table, then a single, specialized optimized index creation routine can be invoked within VFP to create the index.

For example, I profiled the code for creating 10000 records. If the special routine is invoked to create the index on these records,

If the index is created beforehand, then the routine to add a node (let’s call it AddNode) to the index is called 10001 times (the extra 1 time is for the index itself: multiple indices can be stored in an index: see What is an index anyway?)

If the index is created after, then the special routine is called, which reduces the number of calls to AddNode to 312!

On my machines, the index after scenario is about 3-5 times faster than the index before! What results do you get?

Next question: what effect will there be if you use random data rather than a constant string? You may be surprised by the results!

(Another question: Why does this code not use SYS(2015) - Unique Procedure Name to get random values?)

SET EXCLUSIVE ON

SET SAFETY OFF

num=100000

?"Create Index before = ",TestIt(.t.,num)

?"Create Index after = ",TestIt(.f.,num)

USE

n=ADIR(aa,"test.cdx")

?"Index size = ",aa[1,2]

PROCEDURE TestIt(fIndexBefore as Boolean, num as Integer)

ns=SECONDS()

CREATE table test (name c(100))

IF fIndexBefore

INDEX on name TAG name

ENDIF

FOR i = 1 TO num

IF .t.

INSERT INTO test VALUES (MakeRandString(7))

ELSE

INSERT INTO test VALUES ("testing")

ENDIF

ENDFOR

IF !fIndexBefore

INDEX on name TAG name

ENDIF

RETURN SECONDS()-ns

PROCEDURE MakeRandString(nLen as Integer)

LOCAL i,cStr

cStr=""

FOR i = 1 TO nLen

cStr=cStr+CHR(RAND()*26+65)

ENDFOR

RETURN cStr

Index creation performance issues

Additional resources