Scenario 7 : I cannot access my website, although all role instances are in running state

Referring to my blog on Azure Cloud Service Troubleshooting Series, this is the 7th scenario of the lab. Please make sure you have followed the lab setup instructions for Visitor Tracker application as per this, to recreate the problem.

 

Symptom

Visitor Tracker is an ASP.NET SignalR application which tracks the number of visitors accessing the website. But suddenly for some reason I am unable to access the application over default cloud service url (https://cloudservicelabs.cloudapp.net/) and getting an error on the IE browser stating that - "Can’t reach this page", although the role instances are in running state.

Running instance

 

Troubleshooting

When we talk about application and service availability, the first thing that comes to my mind is to check if the input endpoint is responding or not. By default HTTP port 80 is always opened for a cloud service web role, hence I would take help of my best friend PsPing and check if port 80 is open or not.

psping cloudservicelabs.cloudapp.net:80
PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2014 Mark Russinovich
Sysinternals - www.sysinternals.com
Invalid destination address:
Host not found.

Hmmmm.. that is something really interesting ! Port 80 is blocked 😲 ?

I always like to confirm a behavior or issue from various perspectives which really helps me in isolating the issue faster. So I am going to run another tool called PortQry which you can use to troubleshoot TCP/IP connectivity issues like this one. It comes in both flavors, command line and GUI version as well.

Starting portqry.exe -n cloudservicelabs.cloudapp.net -e 80 -p TCP ...
Querying target system called:
cloudservicelabs.cloudapp.net
Attempting to resolve name to IP address...
Name resolved to 40.124.28.4
querying...
TCP port 80 (http service): FILTERED
portqry.exe -n cloudservicelabs.cloudapp.net -e 80 -p TCP exits with return code 0x00000002.

 

Ahhaa...Portqry.exe reports the status of a TCP/IP port 80 as Filtered which means it is blocked as per this documentation. However I am curious to know how the application responds over localhost, hence I have enabled the RDP for the role and logged into the instance to browse the application locally. When I opened the IIS, interestingly I found that there was a http binding over port 81 and not on default http port 80. Hence the request was not reaching the IIS and subsequently we were getting the error "Can’t reach this page", whereas locally the website spawned up as expected.

So...now... I browsed the cloud service url over port 81 - https://cloudservicelabs.cloudapp.net:81/ but wait... again same error ? How can be this possible ? I did a PsPing again but this time over port 81 and got a timeout error...

psping cloudservicelabs.cloudapp.net:81
PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2014 Mark Russinovich
Sysinternals - www.sysinternals.com
TCP connect to 40.124.28.4:81:
5 iterations (warmup 1) connecting test:
Connecting to 40.124.28.4:81 (warmup): This operation returned because the timeout period expired.
Connecting to 40.124.28.4:81: This operation returned because the timeout period expired.
Connecting to 40.124.28.4:81: This operation returned because the timeout period expired.
Connecting to 40.124.28.4:81: This operation returned because the timeout period expired.
Connecting to 40.124.28.4:81: This operation returned because the timeout period expired.
TCP connect statistics for 40.124.28.4:81:
Sent = 4, Received = 0, Lost = 4 (100% loss),
Minimum = 0.00ms, Maximum = 0.00ms, Average = 0.00ms

It seems like the client request is not reaching the server. In these kind of situations, I always recommend my customers to collect simultaneous network trace from client and server to get a holistic picture. From the server side, I couldn't see any request reaching from my client machine, however from client side trace I could see that client tried to send SYN packets thrice as per TCP handshake sequence but server didn't acknowledged it.

ACL-blocking

The above network trace gave me a hint that request is getting blocked. Now the question is who is blocking it and how ? May be firewall or ACL ? Out of curiosity, I checked the ServiceConfiguration.cscfg file of the cloud service solution and found that, indeed, there is an ACL rule which is blocking the traffic. :-)

<NetworkConfiguration>
<AccessControls>
<AccessControl name="ACS">
<Rule action="deny" description="ACS" order="100" remoteSubnet="0.0.0.0/0" />
</AccessControl>
</AccessControls>
<EndpointAcls>
<EndpointAcl role="Tracker" endPoint="Endpoint1" accessControl="ACS" />
</EndpointAcls>
</NetworkConfiguration>

 

In order to resolve the issue for now, I have removed the ACL rule and page came up as expected !

I hope you have got an idea how to troubleshoot connectivity and networking issues in Azure Cloud Service.

Happy Learning !