Azure Network Security Groups (NSG) - Best Practices and Lessons Learned

Article
05/14/2016

While Virtual Network (VNET) is the cornerstone of Azure networking model and provides isolation and protection. Network Security Group (NSG) is the main tool you need to use to enforce and control network traffic rules at the networking level. Customers can control access by permitting or denying communication between the workloads within a virtual network, from systems on customer’s networks via cross-premises connectivity, or direct Internet communication. In the diagram below, both VNETs and NSGs reside in a specific layer in the Azure overall security stack, where NSGs, UDR, and network virtual appliances can be used to create security boundaries to protect the application deployments in the protected network.

What is a Network Security Group (NSG)?

https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-nsg

In one of my recent engagement with my partners, I had the opportunity to test NSG power, facing some limitations and gaining some good knowledge and experiences I would like to share with you. For some of them, I will include links to existing documentations since published before this blog post.

Think and Plan before Deploy

If you played even only minimally with NSG, you immediately realized that you need to think very carefully about your subnets and virtual network architecture: even if you can also assign NSG to the VM network interface (NIC) level, probably you will want to use subnets as your level of granularity (see below for more details). Since it is not easy and immediate to change subnet structure, if you already deployed VMs in there, my first suggestion is to design your VNET architecture before and (also) according to your NSG needs. Once you have designed your network topology, you need to think about the architecture of the boundaries you want to enforce and probably how your DMZ will look like. In this case, you need to answer at least the questions below, I fetched from an excellent article (see URL below), written by Jon Ormond, that I strongly encourage you to read:

Which characteristics and requirements perimeter network should have?
How many boundaries are needed?
Where are the boundaries located?
How are the boundaries implemented?

Microsoft cloud services and network security

https://azure.microsoft.com/en-us/documentation/articles/best-practices-network-security

Good! Now you have VNET and subnets properly designed and implemented, and you are ready to deploy your NSG rules.

Notable differences between ASM and ARM

NSG is an Azure feature available in both ASM (Azure Service Manager or “classic”) and ARM (Azure Resource Manager) API. It is highly recommended to use ARM for new deployments, and then use ASM only when necessary to support existing environments. If you deployed an ARM VNET, you will need to use ARM API for NSG, similarly you will have to use ASM API in a VNET created in this way. If you played already with NSG in ASM, then jumped to ARM, you will be a bit “disoriented”, especially if you used PowerShell, since there are some changes that you need to be aware of.

How to create NSGs (classic) in PowerShell

https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-create-nsg-classic-ps

How to create NSGs in Resource Manager by using PowerShell

https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-create-nsg-arm-ps

Essentially, there is no perfect symmetry between ASM and ARM, at least in PowerShell. NSG can be manipulated in the Azure Portal, but if you want to operate at scale and you want to “code” infrastructure, you will need to use PowerShell or SDKs/REST APIs. Also consider that if you want to do some advanced stuff, probably you will not be able to do that in Azure Portal GUI. This is what I found in my activities, be aware of the following:

- NSG granularity: While in ASM you had the possibility to link NSG to VM, NIC and subnet levels, in ARM it is slightly different: you can associate only to NIC and subnet levels. This has historical reasons since initially multiple NICs were not allowed in ASM, then VM concepts was used when NSG was first implemented. You can see details of Set-AzureNetworkSecurityGroupAssociation for ASM.

- Remove associations: While in PowerShell ASM you had cmdlets Remove-AzureNetworkSecurityGroupFromSubnet and Remove-AzureNetworkSecurityGroupAssociation to remove NSG from subnets or VMs, this is no longer the case in ARM. See section “Remove NSG association from Subnets or NICs” later in this blog post.

- List NSGs: In ASM you have a nice cmdlet named Get-AzureNetworkSecurityGroupForSubnet to retrieve NSG associated to a subnet, in ARM there is not, you essentially need to retrieve the subnet object and then navigate through the properties and see which NSG, if any, is linked.

- NSG logs: Logs and diagnostics for NSG are only available in ARM, there is no ASM coverage, see section “Enable NSG diagnostics and logs programmatically” later in this blog for details.

In general, I noticed that for many Azure objects and resources, in PowerShell recently we switched from a model where a distinct pair of cmdlets existed for everything, to a new model where many operations are now managed by “Get/Set” approach. Now you can retrieve object references and then read and/or manipulate directly property values.

Which Scope for NSG?

Which scope you should use for linking your NSG: subnet level or NIC level or both? Binding NSG to the individual VMs (by NIC) is powerful, but you may quickly lose control of the complexity of your deployment since would be hard to track and maintain. Generally, this is used for specific VMs with Network Virtual Appliances (NVAs) roles, otherwise it is recommended to link NSG to the subnet level and re-use across your VNETs and subnets.

Be very careful when you want to apply NSG to both VM (NIC) and subnet level at the same time: NSGs are evaluated independently, and an “allow” rule must exist at both levels otherwise traffic will not be admitted. Let me give you an example on incoming traffic on port 80: you need to have NSG at subnet level with ALLOW rule for port 80, and you also need another NSG with ALLOW rule on port 80 at the NIC level. For incoming traffic, NSG set at the subnet level is evaluated first, then the NSG set at the NIC level. For outgoing traffic is obviously the converse. The picture below should even clarify this concept more: you can see how rules are evaluated for network packets, once again remember that you need to evaluate this diagram two times: once for subnet level NSG rules, and once for NIC level NSG rules.

Be very careful with Defaults and Tools

When you create a Network Security Group (NSG), even completely empty without any rule, there are some defaults that come with it, you can visualize in ARM accessing the specific property of the NSG object in PowerShell:

$nsg0.DefaultSecurityRulesText

Notice in the output that there are three inbound and three outbound rules. Rules are assigned a priority, and while the default rules cannot be deleted, they can be overridden by rules with higher priority. In ASM you can achieve the same using the PowerShell command below with the “-Detailed” switch:

Get-AzureNetworkSecurityGroup -Name <<NSG Name>> –Detailed

Default rules will ensure that no inbound traffic will be permitted, except for “polling” from Azure load balancer, connectivity inside the VNET/subnet is not blocked, outbound traffic will be permitted including also Internet address space. Then, what is not included here? Your application/service endpoints! Suppose that you installed an IIS VM, you opened port 80 on the Guest OS firewall, you created a load-balanced or NAT rule for port 80. Everything works fine, but now you decide to further secure your environment adding a new Network Security Group: you need to explicitly add a rule for port 80 and protocol HTTP, otherwise when you will apply to the VM/NIC or subnet, your existing application/service will break.

Tools are also very important and final results may differ greatly depending on your choice. If you create your VM using the Azure Portal, at a certain point you will be offered to create a new brand NSG and related rules, if you included a load balancer or Instance-Level IP (ILPIP or PIP). In this case the tool you have chosen has taken care of ensure that, since your VM is exposed to the Internet, access is restricted and secured. A new rule for RDP port, for Windows VMs, or SSH for Linux VM, will be present here. On the other side, if you use PowerShell instead, you will not be prompted or required to create NSG: it us under your responsibility to create proper NSG and rules. While for VIP no port will be opened and then no access will be permitted, even without NSG, if you decide to use ILPIP, you VM will be totally exposed to the Internet for all ports and NSG is strongly required, in addition to a very good Guest OS firewall.

Be very careful on “Deny All” outbound Internet traffic

If your intention is to harden network security of your environment, be very careful with adding NSG rules that will block everything, instead proceed incrementally in a test environment until you will be satisfied with the results. Default rules contains already “Deny All” rules for both inbound and outbound traffic, but are the lowest in priority (65500) and for outbound there is another rule that will allow connection to the Internet. This specific rule has been added to defaults to do not break previous Azure VM behaviors, I have seen many customers and partners that restricted this adding a new rule, with higher priority (lower number), to deny Internet connection partially or totally. This is legitimate, but deny all the Internet traffic maybe dangerous and cause your VM to fail if, for example, you are using VM Extensions as explained in the article below:

VM stuck in “Updating” when NSG rule restricts outbound internet connectivity

https://blogs.msdn.microsoft.com/mast/2016/04/27/vm-stuck-in-updating-when-nsg-rule-restricts-outbound-internet-connectivity

The case above is only one, but there may be other situations where the applications and services installed inside your environment may need to access, for example, other Azure services like Azure SQL DB or Azure Storage resources. Unfortunately, today there is no tag in NSG to identify Azure datacenter IP ranges, they can vary over time and by regions, then how to selectively block Internet outbound traffic without compromising Azure access?

My colleague Keith Mayer built a nice solution as described in the article below. Since Azure datacenter ranges are published here, you can use his work to automate NSG creation, using PowerShell, based on this piece of information that Microsoft periodically update.

Step-by-Step: Automate Building Outbound Network Security Groups Rules via Azure Resource Manager (ARM) and PowerShell

https://blogs.technet.microsoft.com/keithmayer/2016/01/12/step-by-step-automate-building-outbound-network-security-groups-rules-via-azure-resource-manager-arm-and-PowerShell

Another useful example is Azure Diagnostic: if you need to enable this feature for your VMs, you cannot deny all outbound traffic, otherwise the agent running inside the Virtual Machine will not be able to connect. The list can continue with SQL Server VM agents for automated patching and backup to Azure blob storage: if you enable these extensions, you need to permit outbound Internet access from these VMs.

Remove NSG association from Subnets or NICs

One of the problem I faced when initially playing with NSG in ARM was how to “reset” my NSG configuration, that is removing association at the NIC and subnet level. In ASM you have a nice cmdlet called Remove-AzureNetworkSecurityGroupAssociation, but there is no equivalent in ARM. After digging into the details of objects and making some experiment, I created a piece of PowerShell code to achieve that:

$vnet = Get-AzureRmVirtualNetwork -Name "Vnet1" -ResourceGroupName $rgname

$subnet = Get-AzureRmVirtualNetworkSubnetConfig -Name "Subnet1" -VirtualNetwork $vnet

$subnet.NetworkSecurityGroup = $null

Set-AzureRmVirtualNetwork -VirtualNetwork $vnet

Please remember to call Set-AzureRmVirtualNetwork after setting to NULL the NSG property value, otherwise Azure will not commit your modification in the control plane. Approach to do the same at the NIC level is similar, you need to change the NSG property value at the NIC object level:

$nic = Get-AzureRmNetworkInterface -Name "nic2" -ResourceGroupName $rgname

$nic.NetworkSecurityGroupText

$nic.NetworkSecurityGroup = $null

Set-AzureRmNetworkInterface -NetworkInterface $nic

Similarly to the previous case, don’t forget to tell Azure to commit the NIC object configuration change you have just done, otherwise will have no effect. You can use the same technique to change the assigned NSG. Regarding the timing for the operation to be completed, in all my tests I saw the task completed in 30 seconds approximately, but be aware that there is no SLA/SLO on this kind of operations.

Enable NSG diagnostics and logs programmatically

Diagnostic and logs are an important part, especially if you need to troubleshoot unexpected behaviors that may be related to NSG mis-configurations. Remember that in ASM you don’t have this feature. In the article below you can find more information on how to enable this feature and which level of details you can have:

Log analytics for network security groups (NSGs)

https://azure.microsoft.com/en-us/documentation/articles/virtual-network-nsg-manage-log

You can use "Event logs" log to view what NSG rules are applied to VMs and instance roles based on MAC address, and "Counter logs" to view how many times each NSG rule was applied to deny or allow traffic. What is missing here is how to enable programmatically using PowerShell, you can use the sample below to achieve that:

$nsg = New-AzureRmNetworkSecurityGroup -Name testNSGforLogs -ResourceGroupName testwrg -Location westus

$sa = New-AzureRmStorageAccount -ResourceGroupName testwrg -Name nsglogs -Type Standard_LRS -Location westus

Set-AzureRmDiagnosticSetting -ResourceId $nsg.Id -StorageAccountId $sa.Id -Enabled $true

This will enable "Event logs" and "Counter logs" that are not enabled by default, remember this if you need to troubleshoot a potential problem with NSG. Conversely, Azure ARM Audit logs are always enabled, you can programmatically retrieve it using Get-AzureRmLog PowerShell cmdlet.

Audit operations with Resource Manager

https://azure.microsoft.com/en-us/documentation/articles/resource-group-audit

As a great add-on to NSG, you can leverage the great PowerBI visualization and analytics capabilities to analyze NSG log, everything you need is reported at the link below:

Azure Audit Logs content pack for Power BI

https://powerbi.microsoft.com/en-us/documentation/powerbi-content-pack-azure-audit-logs

An example of the log entries you can retrieve is shown below:

VPN and Express Route

If your Azure Virtual Network (VNET) has VPN and/or ExpressRoute gateways, you need to be very careful: applying NSG to the pre-defined “GatewaySubnet” is not supported neither recommended since you may break connectivity. For your information, few days ago, support for User Defined Routing (UDR) applied to the “GatewaySubnet” has been released and be now used.

NSG and Azure Cloud Services (PaaS)

YES, you heard it correctly, it is possible to leverage Network Security Groups (NSG) also for Azure Cloud Services (web/worker roles), if joined to an Azure (regional) Virtual Network (VNET). This possibility is not widely known because originally another feature was used for Azure PaaS v1, that is Network (or Endpoint) ACLs.

Windows Azure PaaS ACLs Are Here!

https://blogs.msdn.microsoft.com/walterm/2014/04/22/windows-azure-paas-acls-are-here

You can consider this feature as NSG v1, then superseded by true Azure Network Security Group (NSG) as you know today in ARM. Please remember that there is no Cloud Service (web/worker roles) equivalent in ARM, this is a pure ASM concept only. Long-story short: if you still have Cloud Services, you can leverage NSG changing the “NetworkConfiguration” section of CSCFG file as in the example below:

< NetworkConfiguration>

</Subnets>

</InstanceAddress>

</AddressAssignments>

<NetworkSecurityGroupRefs>

</NetworkSecurityGroupRef>

</NetworkSecurityGroupRefs>

</ NetworkConfiguration>

In the “NetworkSecurityGroupRefs” section, you can reference already existing NSGs, you cannot define new ones here: you have to create outside and using ASM APIs, ARM will not work here. Finally, please keep in mind that NSG is not compatible with Network/Endpoint ACLs.

Nice-to-Have features and Roadmap

At least in my case, Azure NSG provided everything needed, but there are some areas that require additional work and workarounds, because the platform doesn’t provide (yet) the ideal solution.

- Built-in Tags for Azure services: As pointed out earlier in this post, there is no pre-defined tag to identify Azure services and then distinguish between Internet and Microsoft datacenter.

Add a source tag for Azure Datacenter IPs to NSG Rules

https://feedback.azure.com/forums/217313-networking/suggestions/11716131-add-a-source-tag-for-azure-datacenter-ips-to-nsg-r

- Effective NSG: Since you can have NSG at subnet and VM level, and many rules inside each of them, sometimes calculating the final net effect of a security configuration maybe hard: having a PowerShell cmdlet calculating the net result would be really a great enhancement.

Packet tracert functionality for Network Security Group (NSG)

https://feedback.azure.com/forums/217313-networking/suggestions/9639258-packet-tracert-functionality-for-network-security

- Network Security Group logging capabilities to show dropped packets: This feature has been confirmed in NSG roadmap and will be available soon.

Network Security Group logging capabilities to show dropped packets

https://feedback.azure.com/forums/217313-networking/suggestions/6940205-network-security-group-logging-capabilities-to-sho

- ICMP protocol: Today, you can only specify TCP and UDP as protocols, support for ICMP is not available yet.

- NSG Nesting: You cannot have nested NSGs.

- VM Groups: You cannot define groups of VMs as target for NSG, you have to use subnets for this purpose.

I cannot commit on behalf of the Azure Networking team here, but there are many feedbacks from customers, partners and the entire community out there, and I’m pretty sure will contribute to the next NSG feature improvement. You want to provide additional feedbacks at the link below:

How can we improve Azure Networking? Enter your idea

https://feedback.azure.com/forums/217313-networking

Stay tuned and follow me also on Twitter @igorpag. Regards.