Network connectivity issues diagnosis step by step - Part I.

Many a time we run into network problems between computers in home, offices, labs etc.  which sometimes are easy to diagnose and sometimes they are not. I have heard from customers multiple times that they are not able to access some share on the other machine or not able to connect Telnet to other machines etc. But they are not able to track where exactly the problem is and what it is?

This post discusses some diagnosis steps which would be useful to anyone doing network troubleshooting. The focus is majorly on the issues caused by firewalls or any other filters in the stack blocking the communications. They are very general steps which many people might already know about, however following these steps help narrow down the issue and track to the root cause. Let's take an example of two machines A and B, where A is not able to access some resources shared on B (say \\B\SharedFiles\).

Very first thing one should try checking is whether or not DNS is able to resolve remote machine name. To verify this try accessing the remote machine with  its IP and not with its name. If accessing with IP works that alarms you about DNS name resolution problem. There could be problems like stale DNS cache, duplicate name or any other issues with DNS. However if accessing with IP also doesn't work then further steps should be tried to narrow the issue further down. It's better to use IP address in further troubleshooting so that you don't mess up with any DNS issue again during investigation.

Now there could be multiple places where the connection setup could be failing like:

  1. The connection packet started from machine A could not leave the Machine A itself.
  2. The connection packet reached machine B but B never replied.

Any network traffic capture software(like netmon) can help you identify above points in your network setup.

Mostly the customers run into issue pointed out by number 2. However if its #1 then probable reasons could be Firewall outbound rules, or a misconfigured gateway on Machine A. Most of the firewall by default allow the outbound connections but you should double check the firewall outbound configuration in this case and make sure it is right.

If it is #2, i.e. in network capture software running on machine B, you can see an incoming packet on B from A but no outgoing packet from B to A, in this case the chances are that firewall or something else dropped the incoming packet. Network capture software usually sit at below layers on the stack so it reads the packets long before the packet reaches firewall and is dropped.

Now is the time to verify if the firewall really dropped the incoming packet. If you are not able to see any inbound rule in the firewall for the communication in question, you should create an inbound rule in firewall for the desired communication and then check if the connections succeeds. Different firewalls have different interfaces to create inbound rule. One thing you should remember while creating rules in Windows firewall is that you chose right profile in your inbound rule. Windows firewall has three profiles Public, Domain and Private and while creating rules you need to select profiles for which the rules should be applicable. If you are not sure about the profiles, for investigation purpose you can chose all profiles so that your new incoming rule is applicable to all profiles.

For troubleshooting sake you might also consider to turn off firewall and then try the connection. I have seen a few customers do a mistake while turning off Windows Firewall, instead off turning it off from windows firewall management console, they just stop the firewall service(net stop mpssvc). Note that stopping the service is not the right way to turn firewall off, you should go to firewall management console (Wf.msc) and then should turn firewall off from there for all the profiles. I should emphasize that if you are not sure about the firewall profiles, turn it off for all the three profiles.

Well if the creating an incoming firewall rule or turning the firewall off helps solves the issue, it was your firewall that dropped the packets. You can figure out the same thing by enabling firewall logging and peeking into the trace and check if it has a trace for the dropped packet of communication in question. This might be difficult if there are many connection established on the machine as the trace becomes large.

Till now I just talked about Windows firewall, there might be other third party firewall dropping the packets. Or there could be something else blocking the communication like IPSec. Let's discuss them in next post.....