Getting Started with the Kusto Query Language (KQL)

Azure Data Explorer is in public preview and their documentation is an excellent place to educate yourself on the Kusto Query Language that is used to interact with Azure Data Explorer. There's also a 4-hour Pluralsight course which will really jump start you. The Azure Data Explorer white paper also covers the basics of the query language. But you're still reading, so I'll assume you want my version of how to get started with the query language.

Queries generally begin by either referencing a table or a function. You start with that tabular data and then run it through a set of statements connected by pipes to shape your data into the final result. Here are some basics to get you querying:

  • where - The "where" operator allows you to filter your data. So if you start with TableA and you want to only keep events that have a certain key,you would use:

     TableA | where MyKey='MyValue'
    
  • project/extend - These two operators help you pick the fields that you want to show up in your output. Extend adds a new field and project can either choose from the existing set of fields or add a new field. These two statements produce the same result:

     T | project A, B, C=D+E
    
    T | project A, B | extend C=D+E
    
  • count - The count operator returns a scalar value of the number of rows in the tabular set.

     T | count
    
  • summarize - This is a big topic, but we'll keep it light for now. The summarize operator can perform aggregations on your dataset. For example, the count operator mentioned above is short for:

     T | summarize count()
    

    You can specify a number of aggregations over a variety of fields:

     T | summarize count(), sum(A), avg(B) by C, D
    

    The bin() function is often used in conjunction with summarize statements. It lets you group times (or numbers) into buckets.

     T | summarize count() by bin(TimeStamp, 1d) // count the number of rows per day
    
  • join - Many types of joins are supported but the common ones are inner join (keep rows that match on both sides) and leftouter (keep all rows from the left side and include matching rows from the right). You technically don't have to specify a join kind but I recommend that you always do. It makes for easier readability and the default probably isn't what you expect.

     T | join kind=inner (U) on A
    

    Note that joins are only on equality and generally it's expected that the keys have the same name on both sides. If they aren't the same, you can use a project statement to make them the same or use an alternate key specification syntax:

     T | join kind=inner (U) on $left.A == $right.B
    
  • now(), ago() and datetime math - Azure Data Explorer excels at time series data analysis. There are some handy functions to get used to like "now()" which gives the current UTC time and "ago()". The ago function is especially handy when you're looking for recent data.

     T | where A > ago(5m) // where A is greater than 5 minutes ago
    
    T | where A > ago(1d) // where A is greater than 1 day ago
    

    You can also do easy datetime math.

     print ago(1d) + 10m
    
  • arg_max(), arg_min() - These are somewhat advanced topics but I include them when I introduce people to Kusto because they are so handy. Imagine that you have a bunch of entities and each one sends a row to your table periodically. You want to run a query over the latest message from each entity.

     T | summarize arg_max(Timestamp, *) by Id // for every Id, get the row with the maximum Timestamp
    

    Use these functions with care though. If they are used on a huge table and the cardinality of the grouping is high, it can destroy performance.

  • Rendering charts - Both  the Kusto Explorer desktop client and the web client have the ability to easily render charts. You can read the documentation to learn about the various types, but since I deal with a lot of time series data, the one I use the most is timechart. It's a line chart where the x-axis is a datetime and everything else goes on the y-axis. It automatically keeps the x-axis spaced nicely even if your data doesn't have every time specified.

     T | render timechart
    

In future posts, I'll cover some beginning perf tips and style/readability tips. You can find all of my posts on this topic under the Azure Data Explorer tag. Keep calm and Kusto on!