Troubleshooting Snapshot Agent hang
Microsoft SQL Server Escalation Services
I recently got a customer call where the SQL 2008 Snapshot Agent appeared to hang when started from SSMS.
As you can see, pop-up status window shows “Starting agent”. So, looks like its running, but it stays this way forever.
Replication Monitor also shows as “Running”.
If I double-click the I can zoom in on Agent details, however all I get is message the makes it sound like the Agent is busy, or maybe hung, but I’m not sure.
So, how can I tell what the Snapshot Agent is doing without all the high-layer status tools getting in the way.
Answer, since the Snapshot Agent is just Windows executable (…100\com\snapshot.exe), I can run it outside of SSMS but starting it from a Windows CMD-line, yes, we still use CMD prompt for troubleshooting. Here is what I did.
First, I located the JOB in SQL Server Agent which starts the Snapshot processes. The JOB name has the server name, the database name and the publication name. The “-3” matches to an Agent ID (SELECT [id],[name] FROM [distribution].[dbo].[MSsnapshot_agents]). Not really important, just an interesting side note.
I right-click the Agent and bring up the Properties. This also confirms I’m looking at the Snapshot Agent.
Selecting, “Steps”, then “Run Agent”, then “Edit” I can see the command line parameters passed to the Snapshot Agent. I’ll need these if I’m going to run the Snapshot Agent from an CMD prompt.
I Select ALL, then Copy. Next I paste into Notepad (any text editor works) along with the path to the Snaphot.exe program. Remember to include “ around the path do to the spaces in the path.
"C:\Program Files\Microsoft SQL Server\100\COM\snapshot.exe" -Publisher [CHRISSK1\SQL2K8] . . .
Now I’m cooking, I got the error shown below.
System.TypeInitializationException: The type initializer for 'Microsoft.SqlServer.Replication.StringResources' threw an exception. ---> System.ApplicationException: The native replication resource dll failed to load.
--- End of inner exception stack trace ---
at Microsoft.SqlServer.Replication.ReplMessage..ctor(Exception e)
at Microsoft.SqlServer.Replication.HistoryMessage..ctor(Exception e)
at Microsoft.SqlServer.Replication.SnapshotGenerationAgent.SnapshotGenerationAgentMessageFactory.CreateAgentMessage(Exception e)
at Microsoft.SqlServer.Replication.AgentCore.LogMessage(Exception e)
at Microsoft.SqlServer.Replication.AgentCore.ThreadFailedHandler(Exception e)
The type initializer for 'Microsoft.SqlServer.Replication.StringResources' threw an exception.
Up until now I was using common troubleshooting technique for any of the Replication Agents, that is I execute then directly, outside of SSMS or Replication Monitor. This way I can see errors before they get filtered into nice clean messages.
But now what?
The key message is “The native replication resource dll failed to load.”. This indicates a replication specific DLL used by the Snapshot is missing. But which DLL? Actually it is a RLL, not a DLL.
C:\Program Files\Microsoft SQL Server\100\COM\Resources\1033\REPLRES.rll
Turns out the customer was trying to save space and moved a few files around. After all, why do I need something with RLL extension anyway, well we do. This is a common component for the Replication Agents and the path set during installation and noted in the Registry. Once we restored the REPLRES.rll to it rightful place the Snapshot Agent completed just fine.
Congratulation if you read this far. I hope what you’ve learned is that the management tools sometime filter out what is really occurring when a Replication Agent is running. If your not sure what’s happening, try running the executable from a CMD prompt.