Adam’s unofficial Exchange 2003 troubleshooting tips
All items are listed in the order in which I usually reference/perform them. This is general guidance based on my 8 years working with Exchange, not prescriptive Microsoft Guidance.
As always, for the fastest, most methodical approach to fixing a problem and finding the cause, call Premier Support! Those guys do nothing but troubleshoot all day & night, we’ll never be as good as them!
· Always consider the most basic things first - networking, name-resolution, AD, services running, databases mounted, etc.
· Turn on Diagnostic logging for the suspect component
· Look in the Event Log (Exchange uses the Application Log) (lookup events in the KB or eventide.net, search the Exchange Newsgroups or your favorite forum)
· Setup a Perfmon to capture historical & real-time performance data (see Troubleshooting Microsoft Exchange Server Performance doc)
· Cmd-line: netstat -na to view all active connections to/from the server
· Look at mail-flow using Message Tracking Center in ESM - for example look at the last hour of message flow, is it extremely higher than usual?
· Look at active POP/IMAP/MAPI sessions via Virtual Servers in ESM & EXMON
· Most client performance problems are Exchange being Disk I/O bound or Memory bound. CPU & Network are very very rarely the cause.
· Use ProcessExplorer by sysinternals to see a particular process and/or thread’s CPU, disk I/O, network sessions, etc.
Documentation (there are dozens more, but these I use most often)
Microsoft Exchange Server 2003 Technical Reference Guide
Troubleshooting Microsoft Exchange Server Performance
Microsoft Exchange Server 2003 Transport and Routing Guide
Microsoft Exchange Server 2003 High Availability Guide
Microsoft Exchange Server 2003 Performance and Scalability Guide
Tools for Exchange Server 2003
(The ones in the above URL I use most often)
NetDiag & DCDiag (already on all your servers) - run from cmd prompt - helps identify the most common problems - DNS, WINS, networking, & AD
User Monitor (EXMON) when users are experiencing RPC disconnects
SMTPDiag -Determine whether SMTP and DNS are configured to reliably deliver mail to an external e-mail address
Exchange Troubleshooting Assistant (EXTRA)
Exchange Best Practices Analyzer (EXBPA)
ExchDump - export configuration information
Process Explorer - very deep process & memory analyzer
IIS 6 Resource Kit
Debugging Tools for Windows 32-bit Version - analyze crash-dumps, etc,
Monitoring - From the MOM Management Pack for Exchange doc
Table 4.2 Minimum messaging functions to monitor
· Server heartbeat.
· Required services are running.
· Databases are mounted.
· MAPI logon check verification is running without errors.
· Mail flow verification is running without errors.
· No unexpected service termination.
· Front End Server Monitoring test is running without errors.
· Verify that all required services are running on each server. Note that you can configure the list of monitored services for each server.
· Generate an alert when a service is not running.
· Verify that all databases are mounted.
· Generate an alert if any database becomes dismounted.
MAPI Logon check
· Verify that the Server Availability Report shows no errors. This test verifies that each store can be accessed by a MAPI client, and implicitly verifies both Exchange and Active Directory functionality.
Log on to the mailbox of a test account
· Verify client to server connectivity, including verification that Exchange is running, the database is mounted, and Active Directory is functioning correctly.
· Use this data to compile server availability statistics.
Front-end Server Monitoring
After you edit your registry to enable Front-end server monitoring, the following tests are performed:
· Verify that services are running on the front-end server.
· Verify that Internet clients can connect, including Outlook Web Access, Outlook Mobile Access, and Exchange ActiveSync (for computers running Exchange Server 2003).
· Verify localhost monitoring occurs by default.
· Verify that the public URL is resolvable and successfully connects to your front-end servers.
· Verify that connectivity through your firewall and/or proxy server is functioning.
· Verify that load balancing is occurring.
Mail flow verification
· Verify mail flow between selected servers by sending periodic e-mails to test mailboxes on each server.
· Generate an Alert for successive failures.
· Record mail delivery latency.
Server Health Monitoring
Scripts and rules are configured by default to monitor key health indicators. These indicators include:
· Free Disk Space
· Mail Queue Thresholds
· Configuration and Security
· Performance Thresholds
· SMTP Queues
Free disk space
Running out of disk space is a common, preventable source of Exchange failures. This test monitors counter thresholds that you specify for the following performance objects:
· All disks
· Log disks
· SMTP queue disks
The Free disk space test is cluster and IFS aware, and uses WMI to collect information. It does not use performance data.
· Verify that all mail queues (SMTP, MTA, internal mail delivery queues) are processing messages according to your thresholds
· Verify that mail is flowing properly
· Identify queue length problems that may lead to slow e-mail delivery and identify issues in your infrastructure that require attention
· This data is based on performance data and Exchange WMI classes.
Server Configuration and Security Monitoring
· Verify that the IIS Lockdown Tool started.
· Verify that Message Tracking Log shares are locked down.
· Verify that the URLScan ISAPI filter is installed and running.
· Verify that SMTP Virtual Server cannot anonymously relay (spam prevention).
· Check for the existence of mailboxes on Front-End Servers.
· Determine if SSL should be required.
· Verify that the Log Files are being successfully purged after backup.
· Verify that the SMTP directories are on a NTFS formatted drive.
· Verify that circular logging is disabled for each Storage Group.
· Verify that the value of the HeapDeCommitFreeBlock Threshold Registry Key is correct.
· Verify that Message Tracking is enabled.
· Generate an alert if thresholds for disk response are exceeded, indicating a slow disk.
· Generate an alert if the RPC requests queue length exceeds expected thresholds. A consistent high value can indicate that you have a resource bottleneck.
· Monitors the average RPC latency of all RPC requests submitted to the server.
· Monitors the Outlook Mobile Access Latency response time.
Server performance issues quickly become user response time issues. You can quickly solve these problems if you monitor the correct objects and act upon the issues that MOM brings to your attention.
Database checkpoint depth and memory usage
An alert is generated by default if any of the following counters exceed the identified threshold:
· Disk Read Latencies: 50 msec
· Disk Write Latencies: 50 msec
· ESE Log Checkpoint Depth: 800
· Information Store Private Bytes: 1 GB
· Information Store Virtual Bytes: 2.9 GB
· MSExchangeIS: RPC Requests: 25
· MSExchangeIS: RPC latency: 200 ms