Index creation performance issues

In this post: Index creation performance question  I asked: Is it faster to fill a table with values, then index it or create the index on an empty table, then fill it? Why?


After a record in a cursor is modified, any associated index must be updated.  If there is already an index, then after each record added, the index needs to be checked.


If the index is created on the filled table, then a single, specialized optimized index creation routine can be invoked within VFP to create the index.


For example, I profiled the code for creating 10000 records. If the special routine is invoked to create the index on these records,


If the index is created beforehand, then the routine to add a node (let’s call it AddNode) to the index is called 10001 times (the extra 1 time is for the index itself: multiple indices can be stored in an index: see What is an index anyway?)


If the index is created after, then the special routine is called, which reduces the number of calls to AddNode to 312!


On my machines, the index after scenario is about 3-5 times faster than the index before! What results do you get?


Next question: what effect will there be if you use random data rather than a constant string? You may be surprised by the results!


(Another question: Why does this code not use SYS(2015) – Unique Procedure Name to get random values?)







?”Create Index before = “,TestIt(.t.,num)

?”Create Index after  = “,TestIt(.f.,num)



?”Index size = “,aa[1,2]


PROCEDURE TestIt(fIndexBefore as Boolean, num as Integer)


          CREATE table test (name c(100))

          IF fIndexBefore

                   INDEX on name TAG name


          FOR i = 1 TO num

                   IF .t.

                             INSERT INTO test VALUES (MakeRandString(7))


                             INSERT INTO test VALUES (“testing”)



          IF !fIndexBefore

                   INDEX on name TAG name


          RETURN SECONDS()-ns


PROCEDURE MakeRandString(nLen as Integer)

          LOCAL i,cStr


          FOR i = 1 TO nLen



          RETURN cStr



Comments (3)

  1. Mike Potjer says:

    Interesting. Thanks for the explanation.

    As for the SYS(2015) question: each call to SYS(2015) produces a result that is distinct in the *right-most* character(s), the first character is always the same, and the generated values are all consecutive. In a quick test where SYS(2015) was called 10000 times, the first 7 characters of all 10000 values I got back were identical. I think that alone makes SYS(2015) a poor choice for a real-world test. However, there were a couple other differences. The SYS(2015) values can be indexed more efficiently than a more random set of values, making the timing results less applicable to a real-world scenario. I also noticed that the CDX size of the table stuffed with SYS(2015) values was considerably smaller than the CDX for a more random set of values of the same length.

  2. Jurisfox says:

    On machine – P4 2.4 with 512Mb Ram

    i got 2.957 and 1.996 – 1.48 times faster

  3. aaron says:

    Does this apply to views as well?  I can’t seem to see a performance difference.