Data Mining UrlScan 3.0 Logs using LogParser 2.2


We released a new version of UrlScan recently, and one of the great new features in this version is log files that conform to the W3C Extended Log File Format. What this means to administrators is that they can now parse their UrlScan activity using almost any common log utilities, including Microsoft’s LogParser 2.2 utility. For anyone that hasn’t heard of LogParser, this is a freeware utility from Microsoft that allows you to write SQL-style queries to extract useful information. Eventually I’d like for the following information to show up on the http://learn.iis.net/ web site, but for now I’d like everyone to at least have access to the information.



Getting Started


Downloading and Installing UrlScan and LogParser


The download locations for UrlScan are listed in the “Getting UrlScan” section of the following topic:



http://learn.iis.net/page.aspx/473/


Note: There are separate downloads for 32-bit and 64-bit Windows.


LogParser is available from the following URL:



http://www.iis.net/go/1287


Note: It’s a 32-bit application, but you can use it on a 64-bit Windows system.


While it’s not essential, copying Logparser.exe to a folder in your PATH will allow you to run LogParser from any folder on your system.


UrlScan Log Files


Locating your UrlScan Log Files


By default UrlScan installs to %WinDir%\System32\Inetsrv\urlscan, and this is where your Urlscan.ini file should be located. (If you installed to a different location, you would need to locate your installation by searching for your Urlscan.ini file.) In the [Options] section of the Urlscan.ini file you should see an entry for LoggingDirectory that is set “Logs” by default, which means that your default directory for UrlScan log files is %WinDir%\System32\Inetsrv\urlscan\Logs.


Log File Names


UrlScan log files are named using the following syntax:



urlscan.mmddyy.log


Where:
















mm = Month
dd = Day
yy = Year

Log File Fields


The following table lists the fields that are available:








































Field Description
Date Request date in YYYY-MM-DD format
Time Request time in UTC time
c-ip IP address of the client that made the request
s-siteid ID of the web site that received the request
cs-method HTTP method for the request
cs-uri URL for the request
x-action Action taken; for example:







Logged and allowed
Rejected
x-reason Reason for action; for example:








































a request header was too long
content length too long
disallowed header detected
disallowed query string sequence detected
disallowed url sequence detected
dot in path detected
extension not allowed
failed urlscan rules
high bit character detected
query string too long
second pass normalization failure
url too long
verb not allowed
x-context Part of request that triggered the action; for example:






















Content-Length
file extension
HTTP_ALL_RAW
HTTP method
query string
request headers
URL
cs-data The request data that triggered the action.
x-control Config control data that caused the trigger.  For example, if a request was rejected because of a deny string from a UrlScan rule, the specific deny string will be listed.


Querying Data


Simple Queries


Selecting Everything


The simplest (and least useful) query that you can issue is the following, which is listed here just to get us started:



logparser.exe “SELECT * FROM urlscan.*.log” -i:w3c


Let’s analyze what this query does:



  1. Selects all fields (“SELECT *”) from all records in all log files (“FROM urlscan.*.log”)

  2. Specifies the log file input type as “W3C Format” (“-i:w3c”)

Note: Specifying the input file format as “W3C Format” is essential – without that step you would not be able to parse your UrlScan logs.


Selecting Specific Fields to Query


You can restrict the amount of data that you see by specifying which fields to return:



logparser.exe “SELECT date,time,c-ip,x-action FROM urlscan.*.log” -i:w3c


Here’s what this query does:



  1. Selects only four specific fields (“SELECT date,time,c-ip,x-action”) from all records in all log files (“FROM urlscan.*.log”)

  2. Specifies the log file input type as “W3C Format” (“-i:w3c”)

This allows you to greatly reduce the amount of information that you see in your results.


Exporting Logs into a Different Format


As mentioned earlier, selecting all of the records in all of your logs is probably not all that helpful, unless you were interested in parsing your UrlScan information using another application, like Microsoft Excel, Microsoft Access, SQL Server, etc. In which case, you might want to export all of your data into a comma-delimited file like the following example:



logparser.exe “SELECT * INTO UrlScan.csv FROM urlscan.*.log” -i:w3c -o:csv


Here’s what this query does:



  1. Selects all fields (“SELECT *”) from all records in all log files (“FROM urlscan.*.log”)

  2. Inserts from all records into a comma-delimited file (“INTO UrlScan.csv”)

  3. Specifies the input file type as “W3C Format” (“-i:w3c”)

  4. Specifies the output file type as “CSV Format” (“-o:csv”)

You could just as easily export into a tab-separated file using the following syntax:



logparser.exe “SELECT * INTO UrlScan.tsv FROM urlscan.*.log” -i:w3c -o:tsv


Filtering Queries


One of the great things about LogParser and SQL queries in general is that you can filter the amount of data that you see by specifying criteria using a WHERE clause. The following


Filtering based on IP Address


For example, to see all of the requests from a specific client IP address, you could use the following syntax:



logparser.exe “SELECT * FROM urlscan.*.log WHERE c-ip=’192.168.1.1′” -i:w3c


Here’s what this query does:



  1. Selects all fields (“SELECT *”) from all records in all log files (“FROM urlscan.*.log”)

  2. Filters the results based on a specific client IP address (“WHERE c-ip=’192.168.1.1′”)

  3. Specifies the input file type as “W3C Format” (“-i:w3c”)

Filtering based on UrlScan Action


You can also filter based on UrlScan rejections using the following syntax:



logparser.exe “SELECT * FROM urlscan.*.log WHERE x-action=’Rejected'” -i:w3c


Here’s what this query does:



  1. Selects all fields (“SELECT *”) from all records in all log files (“FROM urlscan.*.log”)

  2. Filters the results where UrlScan rejected the request (“WHERE x-action=’Rejected'”)

  3. Specifies the input file type as “W3C Format” (“-i:w3c”)

Filtering based on Uniqueness


You can filter the amount of data that you see by specifying which fields to return and grouping the data based on uniqueness. This is done by adding the DISTINCT clause as seen in the following syntax:



logparser “SELECT DISTINCT c-ip FROM urlscan.*.log” -i:w3c


Here’s what this query does:



  1. Selects only the unique client IP addresses (“SELECT DISTINCT c-ip”) from all records in all log files (“FROM urlscan.*.log”)

  2. Specifies the input file type as “W3C Format” (“-i:w3c”)

Grouping, Tabulating, and Sorting Queries


Counting Requests by Fields


Another great feature of SQL-style queries is the ability to tabulate data. The following example shows how retrieve a count of requests based on the client IP address:



logparser.exe “SELECT c-ip, COUNT(*) FROM urlscan.*.log GROUP BY c-ip” -i:w3c


Here’s what this query does:



  1. Selects only the client IP address and request count (“SELECT c-ip, COUNT(*)”) from all records in all log files (“FROM urlscan.*.log”)

  2. Groups the request count based on the client IP address (“GROUP BY c-ip”)

  3. Specifies the input file type as “W3C Format” (“-i:w3c”)

Sorting Data by Fields


You can reorder your results into something more useful using the ORDER BY clause, as illustrated in the following example:



logparser.exe “SELECT x-action,x-reason,c-ip,cs-uri FROM urlscan.*.log ORDER BY x-action,x-reason” -i:w3c


Here’s what this query does:



  1. Selects four specific fields (“SELECT x-action,x-reason,c-ip,cs-uri”) from all records in all log files (“FROM urlscan.*.log”)

  2. Sorts the data in ascending order based on the UrlScan action and reason for the action (“ORDER BY x-action,x-reason”)

  3. Specifies the input file type as “W3C Format” (“-i:w3c”)


Putting it All Together


Counting Requests by Client IP and Sorting Data by Request Count


Starting with the request count query from a previous example, you can rename the request count field to something more useful using the AS clause, as illustrated in the following example:



logparser.exe “SELECT c-ip, COUNT(*) AS c-request-count FROM urlscan.*.log GROUP BY c-ip” -i:w3c


This helps reorder the data based on the count, as shown in the following example:



logparser.exe “SELECT c-ip, COUNT(*) AS c-request-count FROM urlscan.*.log GROUP BY c-ip ORDER BY c-request-count DESC” -i:w3c


Here’s what this query does:



  1. Selects only the client IP address and request count (“SELECT c-ip, COUNT(*)”) from all records in all log files (“FROM urlscan.*.log”)

  2. Groups the request count based on the client IP address (“GROUP BY c-ip”)

  3. Sorts the data in descending order based on the request count (“ORDER BY c-request-count DESC”)

  4. Specifies the input file type as “W3C Format” (“-i:w3c”)

Counting Rejected Requests based on the Reason for Rejections


While somewhat similar to the previous example, the following syntax will filter the results based on rejections and count the number of times that each reason occurs:



logparser.exe “SELECT x-reason, COUNT(*) AS c-request-count FROM urlscan.*.log WHERE x-action=’Rejected’ GROUP BY x-reason ORDER BY c-request-count DESC” -i:w3c


Here’s what this query does:



  1. Selects only the reason and request count (“SELECT x-reason, COUNT(*)”) from all records in all log files (“FROM urlscan.*.log”)

  2. Filters the results based on rejections (“WHERE x-action=’Rejected'”)

  3. Groups the request count based on the reason (“GROUP BY x-reason”)

  4. Sorts the data in descending order based on the request count (“ORDER BY c-request-count DESC”)

  5. Specifies the input file type as “W3C Format” (“-i:w3c”)

Splitting Requests into Separate Log Files based on Site ID


UrlScan writes all log entries to a common log file, but you may want to see your data separated by site. The following example splits your UrlScan logs into separate files by site by creating unique log files that are named for each site ID. This example is a little more complex than previous examples, and is written as a batch file.



logparser.exe “SELECT DISTINCT s-siteid INTO siteids.tsv FROM urlscan.*.log” -i:W3C -o:tsv -headers:off

for /f %%a in (siteids.tsv) do (
   logparser.exe “SELECT * INTO W3SVC%%a.log FROM urlscan.*.log WHERE s-siteid=’%%a’ ORDER BY date,time” -i:W3C -o:w3c
)


Here’s what this batch file does:



  1. Creates a tab-separated file that contains a list of unique site IDs

  2. Creates a loop to step through the tab-separated file containing the site IDs

  3. Selects all fields for the specific site ID and creates a new W3C format file for each site

Comments (0)