Windows Home Server client join troubleshoooting hints

This document is published here because there is no way to publish images on forum. It's written primarily to help our support people, and we are publishing it to help troubleshooting problems with the server join process for Windows Home Server for Beta participants. Please, understand that this document is not official, and provided AS is without any warranties... (see the disclaimer on the sidebar). Also, while I cannot help troubleshoot on this blog (use Connect site for that), I would appreciate comment on how t make this document more useful as well as if you notice any typos or problems in it.

Important notice: the document mentions ports 55000 and 56000, however, if you are using Beta2 build the ports are standard http ports 80 and 443, and for CTP build the ports are 88 and 444.

----------------------------------------------------------------------------

So, server join failed. Now what?

This document is not about troubleshooting client setup or discovery. It’s purely to troubleshoot server join.

Reminder: there are three phases of WHS client software installation:

  1. Install. That’s usual software installation. Files are copied and registered.
  2. Discovery. Client software tried to find WHS over UPNP. That’s the screen with Vista circle cursor going round and round.
  3. Server join. Client software does webservice calls to join WHS. That’s what you see in the very end with either two green marks on the screen (good!) or anything else, usually with some red circles and crosses (bad!)

This document is about handling red circles with crosses.

Steps-by-step

  1. Do trivial stuff: check that your server is up and running, that wires are in place, that your server is connected to network, that your PC is connected to network, and that they are connected to the SAME network. Yes, it’s trivial, but often the case.

  2. Check that your name resolution works. Name resolution is about 9 out of every 10 cases when the problem occurs. An easy way to check name resolution is to go to the command line and type:

    nslookup SERVER

    where SERVER is the name of your WHS server. It should give you IP address of your server. If it does not work, go to name resolution section.
    NB: It's still worth to check steps 3 and 4. Windows uses additional methods (WINS, hosts file) for name resolution which may work even if DNS service failes. If step 3 or 4 work, the name is resolved ok though alternative ways.

  3. If name resolution works, it’s most likely that you have firewall issues. First, try to open your browser and type

    https://server

    You should get a nice picture of a prehistoric office worker sitting behind his desk in savannah without cubicle walls or sunscreen. If you don’t get it, you will get an IE error message. Read it. If it does not help, go to Accessing public website section.

  4. Now type in browser

    https://server:55000/enrollid/id.xml

    You should get a nice XML with the number 1 in it. If you don’t, take a note of the error message and go to Accessing internal website section.

  5. Now type in browser

    https://server:56000/enroll/id.xml

    You should be prompted for the admin user ID and password, and once you give the correct ones, you get a nice XML with the number 2 in it. If you don’t, check the error message and go to Accessing internal site with authentication section.

  6. Now type in browser

    https://server:55000/enrollid/id.aspx

    You should get a nice XML with a bunch of stuff in it. If you don’t, then it’s either a bug or somebody played with IIS settings on the server and broke it. Anyway, take a note of what error message you will get, it may help. There is no simple troubleshooting from this point. On a positive side, we did not see anybody hitting this in months, so if you have a problem, it’s most likely that you already skipped this text and went to the specific section.

Name resolution

Name resolution is the ability of you machine to figure out IP address by the name. You sit in front of your PC and type in IE https://www.microsoft.com, and the machine converts www.microsoft.com into an IP address like 207.46.19.190. Machines use IP address to reach each other.

Name resolution is critical, because many services that WHS provides are actually standard services of Windows 2003 server, and they all need name resolution to work well to be used over home (or any other) network. In fact, technically we could have make server join work without name resolution, but we decided to break without it, because otherwise many other services of WHS will fail all around.

Unfortunately, most home networks are configured just to share Internet connection, not to use home PCs together. A good example is a home printer. You connect it to one home PC, can you print on it from another home PC? Most people cannot, because their home networks don’t support this. The situation is so bad, that even manufacturers of routers for home network don’t consider it seriously and many even expensive routers don’t do that simple job, stripping their customers from this important and useful functionality.

You see, to use a printer on other machine, your machine needs to be able to get to it. To do so, it need to resolve the name of that PC with the printer, and most home networks are not setup for that. Similarly, WHS disk shares, backup services, health monitoring only work if your PC can get to the server.

Did I get too technical? Well, there is a good news. Vista uses extended name resolution mechanism including UPNP/SSDP protocol. What it means is that Vista client normally can get to WHS even if your router is not up to the task (it still have to support UPNP though). So, upgrading to Vista helps in 95% of cases. But not everybody is ready to upgrade to Vista, and if this is your case, read on.

Why do most home networks fail with internal name resolution? Here is a typical home network:

 

 

Name resolution is done by name servers. When your home PC sees a name, it goes to DNS server and asks it, “What’s the IP address for this name?” To share Internet connections you only need to resolve Internet names, like “www.microsoft.com”. In this case all home PCs can be configured to go directly to you ISP provider DNS server on Internet. But once you try to make home PC work with each other, you hit the problem: your ISP DNS server has no clue about the names of you home PCs. And that’s only logical, they are not on Internet!

There is one more complicated version of the same problem. It's when your machine comes to your ISPs DNS and it resolves name. Suppose you name your WHS "SERVER", and for some ridiculous reason your ISP has a machine named "SERVER" on their local (not yours!) network. Then your ISPs DNS server may resolve name to IP address, but it will be a wrong address! So symptoms may be slightly different, but the root cause will be the same -- DNS resolution by DNS server, which does not know about your Home Server.

To resolve home PC names, somebody else should take a job of DNS server. Good routers do that. They present themselves to home PCs as a DNS server, and when in doubt go to your ISP provider’s DNS server for the help. Router is on your home network, so it does know all your home PCs and their IP address. In fact, normally your router is the one who gives IP addresses to your home PCs (DHCP). But it needs to share this knowledge also working as your home DNS server. Many routers out of the box don’t do that. If you can configure your router right, the picture will look very similar but with one critical difference:

Many routers out of box don’t do that, but many modern routers can be configured to do so.

And if instead of a router you have Windows XP Professional, Windows Server, or some Vista SKUs PC with two network cards, it will do that for sure.

There are tricks to make it work without using DNS, mainly around using WINS name resolution.

Trick 1: If all your machines are in the same "workgroup", and router does not block WINS, they will be able to see each other in most cases. If you have OEN headless station, it means that you need to make your client PC to belong to WORKGROUP group. In many cases that should help.

Trick 2: Also, you can potentially assign a static IP to your WHS server and put it hosts table on your PCs. That solution is far from ideal, and you still will have problems with remote access, which will try to go from server to PCs. Also, if you'll ever change the server IP address, you will have to edit hosts tables on all PCs manually again. Essentially, it's a hack, however, it will work. If you are using a headless server, it's even harder and includes a lot of risk of misconfiguring the system into unusable state.

Accessing public website

You came here because you typed in IE

https://server

and got an error message. But you already checked that name resolution works. Now, look at the IE error message.

If it says “cannot find server”, it’s either your server is down, network cable is not inserted somewhere, or firewall. Check cables, see that the server is on (ping server will ensure that), and once it’s ok, go to firewall problems section.

Another popular reason for failing at this step is the setting in Internet Explorer "Internet Options | Connections | LAN Settings | Automatically Detect Settings" or "Use a proxy server" without "Bypass proxy server for local addresses". WHS Connector uses exactly the same Windows code as Internet Explorer and it is affected by these settings. If "Automatically Detect Settings" is checked, your machine may be accidentally (by your router or your Internet sevrice provider) configured to use proxy server for all http calls. If this proxy is on the provider side, or if proxy in the router has problems, you will not be able to access your WHS server over http from local machines. Unchecking this option helps if this was the problem.

Accessing internal website

You’ve come here because name resolution works, public site is accessible fare and square, but when you typed:

https://server:55000/enrollid/id.xml

it failed.

90% chance is that this is a trouble with firewalls, go to firewall problems section.

Accessing internal site with authentication

You came here because name resolution works, public site is accessible, as well as internal site, but when you typed

https://server:56000/enroll/id.xml

it failed.

Read the error message.

If IE says that access is denied, try to recall your password. You typed it incorrectly.

If IE complains about bad certificate, it means that server join was not able to install WHS certificate. It happens in the rare cases of PC misconfiguration, which usually happens on older systems due to malware or user playing with security settings on the system. This is a very rare case.

If IE says it cannot find the site, it’s most likely again firewall issue. Go to firewall problems section.

Firewall problems

Guess, how many firewalls are between your PC and WHS? You have three times to try.

Three. At least. In my case four, because I have two firewalls on my home PC. Here is how it looks like in a typical home network setup:

You see, three of them are in the way of your PC communicating to WHS.

WHS firewall

This one is a most harmless of all for WHS communications. After all, it’s WHS firewall and it is configured to be friendly for WHS use. Within a reason, of course. Specifically, most WHS communications are configured to be only allowable on the same subnet. What does it mean?

Suppose you have a typical home network configuration to use private network addresses 192.168.0.*, for example:

Router 192.168.0.1
PC 192.168.0.5
WHS 192.168.0.13

This will work fine. However make it

Router 192.168.0.1
PC 192.168.1.5
WHS 192.168.2.13

and WHS firewall will start blocking attempts by the clients to connect. The rule of thumb is that the first three numbers in IP addresses of WHS, PC and router must be the same. Also, subnets like 192.168.3.* or 192.168.7.* will also work, as long as three first numbers are the same. That’s not technically 100% true, if you are ready to go into technical detail, but if you don’t, don’t even try.

Some people feel uncomfortable with the idea of only 255 computers on their home network (not that they really have 255 devices on it), and use other private address spaces like 10.*.*.* or 172.16.*.*-172.31.*.*, which is potentially ok if you configure subnet mask right, although it is usually configured by default only to let the last number change… Too technical again? Yes, just don’t use these ranges. Stick to ol’good 192.168.0.*.

Actually, that’s the only trouble with WHS firewall that you may encounter (unless you do manual configuration of WHS, in which case you should know all this stuff already anyway).

Router firewall

That’s not necessarily easy, but straightforward. You router should let the intranet traffic through, at least for WHS services. To join you need TCP connections on ports 55000 and 56000. For transport you need the port 1138. There is also a couple of ports that you need for backup and remote access. Also it needs to let UPNP packets through, otherwise server discovery won’t work.

You don’t have to open them to Internet. In fact, you better not open them to Internet. But they should be open for internal home network computers.

PC firewall

Default Windows firewall settings allow WHS client software to go out to the server both from XP and Vista client, no problems. Of course, if you set it manually, see that the same rules as for router firewall apply for outgoing connections and UPNP response. Under no circumstances, except Remote Access, WHS will try to contact your PC. All connections (again, except remote access) goes from the PC to your WHS server.

OneCare firewall is supposed to let signed binaries out, and that’s all WHS client needs, although we’ve seen occasionally OneCare not letting connection out. You may need to open these ports manually.

Most troublesome are third party firewalls. All of them could block some WHS client connections. If you have those, you need to configure them manually to allow the WHS client outgoing connections.

Conclusion

I realize, that more information is needed on the subject, and maybe I’ll be able to come to that and extend this post or write additional posts on the subject. But right now the whole team is very busy, so it was tough to write even this. Still, the plan is to extend this document with more detail and more information/cases as we will find them.

Also, please, understand that I cannot troubleshoot your system through comments on this blog. The right way to submit beta bugs is through the Connect site, where there is a way to get reasonably full troubleshooting information about your problem.

First version: 4/28/07 11:55 pm
Next update: 5/2/07 8:10pm
Next update: 7/9/07
Next update: 7/13/07