SharePoint 2010 Capacity Management: Log Parser

Article
09/22/2011

Log Parser is a powerful tool to extract useful usage information from IIS logs. It can be used for data sources other than IIS log files (XML files, CSV files, registry, event logs, etc.), but for the SharePoint 2010 usage scenario, we will need IIS logs input only.

LogPraser can be downloaded at https://www.microsoft.com/download/en/details.aspx?displaylang=en&id=24659

Log Parser is a command-line tool that takes an input and a command to execute. It returns an output to the screen, an output file, SQL database or an image.

It is made up of three components:

Input Formats: Log Parser supports many input formats such as Event Logs, XML Files, and Windows Registry. The one we will need is IIS Log files.
SQL-Like Engine Core: This component processes data from the input source. The syntax is SQL based, so it uses keywords like “select”, “where”, “group by”, etc.
Output Formats: Log Parser supports many output formats such as text file formats or SQL databases. The most important for our purpose is screen output and CSV format.

More information on logparser formats and the syntax for logparser queries is present in the log parser documentation that is downloaded with the setup file mentioned above.

One of the best documents on how to use log parser for analyzing SharePoint usage is present at https://www.microsoft.com/download/en/details.aspx?displaylang=en&=tm&id=4616

For easier use of logparser, it is better to add it in the path environment variable. This will enable you to use the logparser command from any folder that you created. The default path is C:\Program Files (x86)\Log Parser 2.2

To start using LogParser on your environment, create a folder and put in it the log files for a specific day. Copy to that folder the log files from the different front end servers too if you have multiple load balanced front ends.

Below are the most important queries that will help you identify the usage patterns for your environment:

To get different users and the number of times they are present in the logs, apply the following command. (Note that the output is present in LogFileUsers.csv file)

logparser -i:IISW3C -o:CSV "select count(*) as ct,cs-username,logfilename from *.log group by cs-username,logfilename" >LogFileUsers.csv

To get the total number of users per log file, run the following command:

logparser -i:CSV "select sum(ct) as sum,count(*) as users,logfilename from LogFileUsers.csv group by logfilename"

2. Get the top 20 users of the environment by the following query:

logparser -i:IISW3C "select top 20 count(*) as ct,cs-username as user from *.log group by user order by ct desc"

3. Get the Requests Per Second:

Requests per second and their distribution over time is one of the most important measures to the environment usage. Thus, it is important to determine sizing for the new environment.

Generally, when a user requests a page, the browser sends an anonymous request to the server. In case of SharePoint with Windows Authentication, the server will return HTTP 401 response, so the client browser will return authentication details. This 401 response produces no load on the server, but it is logged in the IIS log files. This is why the previous queries results contained requests with empty username.

In next queries, you will find a “where” clause that will exclude these 401 responses as they have no load on the server and shouldn’t be considered within the Request Per Second (RPS).

The queries to get the Requests per Second are:

(Note: These queries use a script file “load.txt” as input. The script of the file is present at the end of this blog)

logparser -i:IISW3C file:load.txt -o:csv -q >RequestsRes.csv

-- distribution by Seconds

logparser -i:CSV –o:CSV "select count(*) as ct,secs,max(ss) as ss,max(mi) as mi,max(hh) as hh from RequestsRes.csv group by secs order by secs" -q >secsdist.csv

-- distribution by minutes

logparser -i:CSV -o:CSV "select count(*) as ct,div(secs,60) as minu,max(ss) as ss,max(mi) as mi,max(hh) as hh from RequestsRes.csv group by minu order by minu" -q >mindist.csv

--Hourly average Requests Per Minute

logparser –i:CSV –o:CSV "select hh as Hour,avg(ct) as AvergeCT from mindist.csv group by hh order by hh" >HourAvgRPS.csv

-- Hourly Peak Requests Per Minute

logparser -i:CSV –o:CSV "select hh,max(ct) from mindist.csv group by hh order by hh" >MinPeakRPS.csv

-- Hourly Average Requests Per Second

logparser -i:CSV –o:CSV "select hh as Hour,avg(ct) as SecPeak from secsdist.csv group by hh order by hh" >SecAvgRPS.csv

-- Hourly Peak Requests Per Second

logparser -i:CSV –o:CSV "select hh as Hour,max(ct) as SecPeak from secsdist.csv group by hh order by hh" >SecPeakRPS.csv

The requests include requests for images orstatic files. Usually, user operations are 1/3 of the total requests.

4. Get Distinct Users Over Time:

Apply the following commands to get the number of distinct users in an hour

logparser -i:CSV -o:CSV "selet count(*) as ct,cs-username,secs,max(ss) as ss,max(mi) as mi,max(hh) as hh from RequestsRes.csv group by secs,cs-username order by secs,cs-username" -q >userdist.csv

-- Distinct Users By Hour

logparser -i:CSV -o:CSV "select hh,cs-username,sum(ct) as req from userdist.csv group by hh,cs-username order by hh,cs-username" >userhhdist.csv

logparser -i:CSV -o:CSV "select hh,count(*) from userhhdist.csv group by hh" –q >userhhdist_PerHour.csv

-- Distinct Users By Minute

logparser -i:CSV -o:CSV "select div(secs,60) as minu,cs-username,sum(ct) as req from userdist.csv group by minu,cs-username order by minu,cs-username" >usermidst.csv

logparser -i:CSV -o:CSV "select minu,count(*) from usermidst.csv group by minu" >usermidst_PerMin.csv

The script of the load.txt file is as follows:

select
EXTRACT_FILENAME(LogFilename),LogRow,

date, time, cs-method, cs-uri-stem,
cs-username, c-ip, cs(User-Agent), cs-host, sc-status, sc-substatus, sc-bytes,
cs-bytes, time-taken,

add(
add(
mul(3600,to_int(to_string(to_localtime(to_timestamp(date,time)),'hh'))),
mul(60,to_int(to_string(to_localtime(to_timestamp(date,time)),'mm')))

to_int(to_string(to_localtime(to_timestamp(date,time)),'ss'))

) as secs,

to_int(to_string(to_localtime(to_timestamp(date,time)),'yy'))
as yy,

to_int(to_string(to_localtime(to_timestamp(date,time)),'MM'))
as mo,

to_int(to_string(to_localtime(to_timestamp(date,time)),'dd'))
as dd,

to_int(to_string(to_localtime(to_timestamp(date,time)),'hh'))
as hh,

to_int(to_string(to_localtime(to_timestamp(date,time)),'mm'))
as mi,

to_int(to_string(to_localtime(to_timestamp(date,time)),'ss'))
as ss,

to_lowercase(EXTRACT_PATH(cs-uri-stem)) as
fpath,

to_lowercase(EXTRACT_FILENAME(cs-uri-stem))
as fname,

to_lowercase(EXTRACT_EXTENSION(cs-uri-stem))
as fext

from *.log

where sc-status<>401

Additional information on using LogParser for SharePoint 2010 analysis will be included in my next blog.

SharePoint 2010 Capacity Management: Log Parser

Additional resources