Azure Log Analytics: Service Level Agreement (SLA) part 2

My post yesterday got a little long so, I’ve done a part2 to show how else you might group your solutions.

Part1: https://blogs.msdn.microsoft.com/ukhybridcloud/2018/11/20/azure-log-analytics-service-level-agreement-sla-part-1/

Here is the new query, I kept it simple so removed the JOIN and links to Perf and Memory that were shown in Part 1.

let start_time =startofday(ago(7d));

let end_time =startofday(now());

// Lets define our groups of computers, using various search techniques

let webServers = (Heartbeat

| where Computer startswith "WEB"

| distinct Computer

| project Computer

);

let domainControllers = (Event

| where EventLog == "Directory Service"

| distinct Computer

| project Computer

);

let aksServers = (Heartbeat

| where Computer has "aks"

| distinct Computer

| project Computer

);

let subnetServers = (WireData

| where LocalIP matches regex "10.118.51."

| where Computer !startswith "D"

| distinct Computer

| project Computer

);

Heartbeat

| where TimeGenerated > start_time and TimeGenerated < end_time

// import generated computer list(s)

| where Computer in (webServers, domainControllers, aksServers, subnetServers)

// Main calculation

| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer

| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)

| summarize total_available_hours=countif(available_per_hour==true) by Computer

| extend total_number_of_buckets=round((end_time-start_time)/1h)

| extend availability_rate=total_available_hours*100/total_number_of_buckets

// Create some readable groupnames

| extend GroupName =

case(

Computer in (webServers), GroupName ="Web Servers",

Computer in (domainControllers),GroupName ="Domain Controllers",

Computer in (aksServers), GroupName ="AKS servers",

Computer in (subnetServers), GroupName ="Subnet51",

"Unknown"

)

// Show the results

| summarize

ServerCount=dcount(Computer),

Availability=avg(availability_rate),

AvailabilityFailedOn=countif(availability_rate < 100)

by GroupName

sla-p2

Essentially this query looks at 4 groups of servers:

1. Web Servers

2. AKS Servers (same as in part1)

3. Domain Controllers

4. A subnet

Web Servers and AKS, I used the same technique as I did in part1. However there is a subtle use of the ‘has’ for the AKS servers to pick up the text “aks” anywhere in the computer name.

Domain Controllers, with this group I used the Event table, and looked for the “Directory Service” log which should be on each DC.

Subnet, this uses a regex to look for a specific subnet, I also excluded any Computers that had a name that started with “D”.

These are few techniques you can use to group computers or services together.

I also wanted a friendly name for each service, so used the Case function for that:

| extend GroupName =

case(

Computer in (webServers), GroupName ="Web Servers",

Computer in (domainControllers),GroupName ="Domain Controllers",

Computer in (aksServers), GroupName ="AKS servers",

Computer in (subnetServers), GroupName ="Subnet51",

"Unknown"

)