Creating, parsing, graphing web hit logfiles or other temporal data

Looking at temporal data can be quite informative. For example, I have a table of about 16000 photographs. Each record has a datetime stamp. Looking at the temporal distribution one can clearly see events, such as the birth of a child, the visit of relatives or the purchase of a camera.

Internet Information Server (IIS) is the web server that ships with Windows XP Professional. If you have XP Pro, you can install it from the Control Panel->Add/Remove programs/ Add/remove Windows Components. Once installed, you can hit your web site just by typing “localhost” in the address bar of internet explorer on the same machine. From a different machine on the network you can hit that web site by typing in the machine name. To hit that machine on the internet, you can get a static IP address from your online provider and attach a domain name to it. For example my IP is https://66.13.78.182 and my domain name is www.calvinhsia.com.

When installed, IIS defaults to logging every web hit to logfiles. Choose Start->Control Panel->Administrative tools->Internet Information Services, then drill down to local computer/Web sites/default web sites. Right click and choose Properties. The Web Site page shows an “Enable logging” checkbox, logging options and properties. The default is to log to “C:\WINDOWS\System32\LogFiles” with Log file name: W2SVC1\exyymmdd.log

Go ahead and hit your own web site via this https://localhost then look at the generated log file. The datetime logged by default is Universal Coordinated Time, which for most people is not local time. This might make your logfile generated name be for tomorrow (or yesterday) depending on your time zone.

(In addition to the IIS logging, you can do your own custom logging. For example, my web site logs every web hit into a fox table.)

The 120 line program below parses the IIS logs into a cursor and graphs the results. It took only a few seconds to scan through more than 1000 daily log files resulting in over half a million web hits. The graph was displayed in about one second.

This SQL Select statement is handy when creating and scaling the data for the graph:

            SELECT INT((dtime-this.nMin)/this.nBucketSize) as bucket,COUNT(*) as cnt FROM webhits GROUP BY 1 INTO CURSOR buckets

When you click on a point on the graph, a SQL Select statement for the corresponding bucket is executed to get a summary of the web hits for that day, which is displayed on a grid on another form.

CLEAR

CLOSE DATABASES

*ERASE webhits.dbf&& erase the file if you want to reparse

IF !FILE("webhits.dbf")

      CREATEOBJECT("ParseLogs")

ENDIF

PUBLIC oGraph as Form

oGraph=NEWOBJECT("GraphForm")

oGraph.Show

DEFINE CLASS GraphForm as Form

      allowoutput=.f.

      left=200

      width=1600

      height=300

      backcolor=RGB(255,255,255)

      forecolor=RGB(255,0,255)

      nMin=0

      nMax=0

      nBucketsize=0

      nBuckets=0

      oGridForm=0

      PROCEDURE init

            SELECT MIN(dtime) as min, MAX(dtime) as max FROM webhits INTO CURSOR minmax

            this.nMin=min

            this.nMax=max

            this.nBuckets=this.Width

            this.nBucketSize=(this.nMax-this.nMin) / this.nBuckets

            SELECT INT((dtime-this.nMin)/this.nBucketSize) as bucket,COUNT(*) as cnt FROM webhits GROUP BY 1 INTO CURSOR buckets

            SELECT MAX(cnt) as maxheight FROM buckets INTO CURSOR maxheight

            nMaxHeight=maxHeight.maxHeight

            SELECT buckets

            x0=0

            y0=0

            SCAN && scan through the cursor, plotting each bucket in a graph

                  this.Line(x0,y0,buckets.bucket, this.height-this.height*buckets.cnt/nMaxHeight)

                  x0=buckets.bucket

                  y0= this.height-this.height*buckets.cnt/nMaxHeight

            ENDSCAN

      PROCEDURE Mousedown(nButton, nShift, nXCoord, nYCoord)

            SELECT * FROM webhits WHERE INT((dtime-this.nMin)/this.nBucketSize)=nXCoord INTO CURSOR results

            IF _tally>0

                  SELECT uri,count(*) from results GROUP BY 1 ORDER BY 2 descending INTO CURSOR daily

                  IF VARTYPE(this.oGridForm)!= 'O'

                        this.oGridForm=CREATEOBJECT("form")

                        WITH this.oGridForm

                              .width=500

                              .height=300

                              .visible=1

                              .top=this.Top+this.Height+40

                              .addobject("grid1","grid")

                              WITH .grid1 as Grid

                                    .Visible=1

                                    .Width=thisform.Width

                                    .Height=thisform.Height

                              ENDWITH

                        ENDWITH

                  ENDIF

                  thisform.oGridForm.grid1.recordsource="daily"

thisform.oGridForm.grid1.AutoFit

                  thisform.oGridForm.caption = DTOC(TTOD(results.dtime))

            ENDIF

ENDDEFINE

DEFINE CLASS ParseLogs as custom

      cDir="C:\WINDOWS\System32\LogFiles\W3svc1\"

      PROCEDURE init

            CREATE TABLE webhits (dtime t,ip c(15),method c(20),uri c(50))

            n=ADIR(aFiles,this.cDir+"*.log")

            FOR i = 1 TO n && for each log file (Log file name: W2SVC1\exyymmdd.log)

                this.ProcFiles(this.cDir+aFiles[i,1])

            ENDFOR

      PROCEDURE ProcFiles(cFile as String)

            ?cFile

            nh=FOPEN(cFile)

            DO WHILE !FEOF(nh)

                  cStr=FGETS(nh)

                  IF ASC(cstr)=0 && null line is empty log

                        EXIT

                  ENDIF

                  IF cStr="#" && if it's part of the log header

                        IF cStr="#Date"

                              cDate=SUBSTR(cStr,8)

                              cYr=SUBSTR(cDate,1,4)

                              cMon=SUBSTR(cDate,6,2)

                              cDay=SUBSTR(cDate,9,2)

                              cDate=cMon+"/"+cDay+"/"+cYr

                        ENDIF

                  ELSE

                        nAt=AT(' ',cStr)

                        cTime=LEFT(cStr,nAt-1)

   cStr=SUBSTR(cStr,nAt+1)

                        dtime=CTOT(cDate+" "+cTime) && time

 

                        nAt=AT(' ',cStr)

                        ip=LEFT(cStr,nAt-1) && IP address

                        cStr=SUBSTR(cStr,nAt+1)

                       

                        nAt=AT(' ',cStr)

                        method=LEFT(cStr,nAt-1) && method

                        cStr=SUBSTR(cStr,nAt+1)

 

                        nAt=AT(' ',cStr)

                        uri=LEFT(cStr,nAt-1) && URI

                        cStr=SUBSTR(cStr,nAt+1)

                        IF !EMPTY(m.dtime)

      INSERT INTO webhits FROM memvar

      ENDIF

                  ENDIF

            ENDDO

            FCLOSE(nh)

ENDDEFINE

41976