America Online (“AOL”) uses Global Server Load Balancing (“GSLB”) to maintain their SIP gateway farm. Their configuration uses sub VIPs, one for each of their two datacenters. Whenever AOL performs maintenance (new code, hardware, etc.), they use GSLB to take one datacenter offline while they update it and point all of the SIP traffic to the secondary datacenter.
As with all GSLB systems, DNS caching on the client side can be an issue. For example, this past week (July 16th), AOL replaced some hardware at one of their sites. When doing so, they directed all traffic to their other site around 12am EDT. It takes a maximum of 30 seconds for these changes to be reflected in DNS.
Several of our Office Communications Server customers were having issues connecting via PIC to AOL (through their SIP gateways) as late as 6:30am that morning, but that the issue resolved itself in about an hour’s time. This sounds very much like DNS caching at work. That is, the users began using the system at 6:30am and continued to have issues until their DNS was refreshed.
While it is unclear where the caching took place (local host, ISP DNS, local network, or within the OCS topology itself), a good rule of thumb is that whenever an OCS customer is unable to reach AOL’s SIP gateways via PIC, the first troubleshooting step should be to initiate a DNS flush (and that it is best to use the sip.oscar.aol.com FQDN instead of a specific IP address when connecting to AOL).