How do large programs work?

When you write code in any computer language, there are common constructs that alter the flow of control, such as conditionals (IF THEN ELSE ENDIF), looping (FOR, FOR EACH), and selection (DO CASE). How do these work? Typically there is an evaluation of an expression and a conditional branch to a different part of the code. The branch consists of something like a GOTO or JUMP instruction, indicating the destination.

In my first machine in 1971 (see Relaxen und watchen das blinkenlights. What lights?) the Assembly Language JUMP destination was essentially a hard coded absolute address in memory. Most modern processors allow branches relative to the current location, which simply means the branch distance is added to the current Instruction Pointer, making relative addressing possible, and thus code is easily relocatable.

Even the Intel Architecture from the 8080 thru the Pentium family architecture allow jumps with multiple different sizes: 8, 16, 32 and 64 bytes (see Pentium 4 Instruction Set Reference)

Historically, FoxBase and Foxpro allowed a compiled program size of < 64K bytes, which means the size of the branch could be expressed in 2 bytes (2^16 = 64k).

In this post Using Very large programs, I said that VFP9 allows > 64k program size. In fact, it can be several thousand times larger. How is this accomplished?

 

One of the changes required was to allow 4 byte branch sizes. In fact, VFP9 allows both 2 and 4 byte branch sizes. For modules < 64K in size, only 2 byte branches are used, thus saving space. Forcing all branches to 4 bytes would have increased all user code size. For larger modules, 4 byte branches are used, so the code gets bigger by a couple bytes times the number of branches.

Also, the FXP format was altered slightly to accommodate large code size modules. Actually, the format is identical to VFP8 unless this module is too large.

 

When the VFP9 compiler is compiling a module, it doesn’t know whether the length of the resulting compiled code will be too large, so it proceeds on the assumption that it will be normal (<64K). This means branches are written out in 2 byte format. If in the process of compilation, the resultant code size gets > 64k, then the compile is restarted in >64k mode and branches are written out in 4 byte format. Because the first 64k worth of code is compiled twice, it takes that much longer to compile large modules. However, this double compilation has no effect on execution performance. The execution performance is affected by large code size mainly by the fact that there’s more program to read to execute the user code.

See Using the Databar feature with real data

 

You can see the effect of the double compilation by running the code below.  (Paste in all the Data Bar code from Excel's new gradient Data Bar feature is cool: you can do it too! below this code to get the graph in the grid.) The code just creates successively larger code and compiles it 50 times in a loop, logging the compile time in milliseconds into a cursor. The resulting graph shows a clear boundary between 286 and 287 where the code size grows into the large code category. Do you get the same results?

 

CLEAR ALL

CLEAR

CREATE cursor Result (nSize i,time i)

SET NOTIFY OFF && turn off thermometer

FOR n = 280 TO 295

      SET TEXTMERGE TO tt2.prg on noshow

            \y='a' && this is used in a CASE statement *way* down there

            \Do Case

            \Case Y = 'a'

            FOR i = 1 TO n

                  \ x="<<REPLICATE("a",100)>>"

            ENDFOR

      \ y="b"

      \Case .T.

            FOR i = 1 TO n

                  \ x="<<REPLICATE("a",100)>>"

            ENDFOR

           

      \ y = "a"

      \ENDCASE

      \set compatible on

      \IF y="b" then

      \ cret= "DOCASE:PASS "+TRANSFORM(FSIZE("tt2.prg"))

      \ELSE

      \ cret= "DOCASE:FAIL "+TRANSFORM(FSIZE("tt2.prg"))

      \endif

      \set compatible off

      \return cret

      SET TEXTMERGE to

      ns=SECONDS()

      FOR i = 1 TO 100

            COMPILE tt2

      ENDFOR

      INSERT INTO result VALUES (n,1000*(SECONDS()-ns))

      ?n,"Compile time = ",time

      ?tt2()

ENDFOR

LOCATE

oForm=CREATEOBJECT("myform","Time")

oForm.show(1)

RETURN

* Paste in the code from https://blogs.msdn.com/calvin_hsia/archive/2005/11/20/495152.aspx