Poison Message Handling

Service broker provides automatic poison message detection, which disables the service queue upon five consecutive transaction rollbacks in trying to receive messages from the queue. However, an application should not rely on this feature for normal processing. Because automatic poison message detection stops the queue, this halts all processing for the application until the poison message is removed. Instead, an application should attempt to detect and remove poison messages as part of the application logic.  


Typically, the message processing application can choose among the following four strategies:

A.      Returns the poison message back to the service queue by rolling back the transaction in hope that the message can be successfully consumed next time when retried.

B.      Preserves the message, the context and the encountered error to some permanent storage, such as a database table, commits the RECEIVE, and continues normally.

C.      Ends the conversation with an error.

D.      Uses SQL server event notification to ask for manual intervention when queue is disabled.


Option A above is good at handling failures caused by temporary conditions. For instance, a purchase order placed by an user can't be processed if the user profile has not existed yet because, say, user profile request processing that's happening in parallel takes more time than the purchase order; once that processing is complete, the purchase order processing is expected to be through without a problem. Another example could be that the processing transaction is chosen as the victim to rollback in the case of a deadlock; after such a deadlock is detected and removed, the message can then be successfully consumed.


However, there are situations where processing the same message will forever fail. One example may be that the message itself is not self-contained, say a purchase order can't be fulfilled because that credit card holder name and home address given don't match what's on file at the bank. A second example could be that a user profile can't be established because the account name chosen has already taken by an existing user. In these cases, approach B then sounds more suitable for the application to log the message content as well as the context in which it was happening so the situations can be further examined to see if the processing logic can be improved to handle these exception cases normally. Indeed, the two examples mentioned here have actually surfaced deficiencies in the application code because both scenarios can well be successfully processed if the message processing code is revised. For a more vivid example how inappropriate application error handling can lead to poison messages, check out Remus' blog here.


Care must be taken when continuing to the next message without the current one been processed. In a stateful session, skipping messages may cause the conversation into an illegal state and hence invalidates the rest of the dialog. If this is the case, the application then can't afford ignoring one message; thus ending the dialog with error as suggested by option C becomes the right thing to do. As an example, say a live meeting service that is unicasting an ongoing presentation, it had already received slide #2 and was waiting for the content of slide #3. But it then found it couldn't process the slide #3 message because of invalid media format. If it simply ignores them, the slide #3 worth of presentation simply becomes dumb to the receiver. So terminating the session with an error probably is the most appropriate.


Depending on what it is doing, an application can well use a combination of what is illustrated the above. For example, it can allow a failed receive to retry three seconds later. Further, if the retry is still unsuccessful, it then can choose to log the message, commit the RECEIVE transaction and continue to the next one.


As the last resort, application programmers may want not to do anything in their code but fully depend on the default mechanism built in service broker for poison message detection and handling. As formerly stated, the mechanism alone is not good enough. But combining it with SQL server event notification at least can request for administrator's manual intervention when a poison message is detected.


Once a BROKER_QUEUE_DISABLED event notification is defined, a notification message of the following format will be sent by service broker to the specified event notification queue when the service queue is disabled due to poison messages:





















To identify which service queue this notification is fired for, first use the database id to find out the database name:


SELECT name AS database_name

FROM sys.databases

WHERE database_id = 6



(1 row(s) affected)



Then switch to that database, and get the service queue name using the object id:





SELECT name AS service_queue_name

FROM sys.service_queues

WHERE object_id = 53575229



(1 row(s) affected)



The SQL script below shows in detail how to create an event notification service to receive notification messages when a user queue being watched gets disabled. A stored procedure is defined to process the posted notification by emailing the administrator about which queue is disabled and ask him/her to jump in to identify and eliminate the problem so normal message processing on the queue can be resumed.


-- create the event notification queue to receive queue_disabled events

CREATE QUEUE [QueueDisabledNotifQueue];



-- create the event notification service for receiving queue_disabled events

CREATE SERVICE [QueueDisabledNotifService]

      ON QUEUE [QueueDisabledNotifQueue]




-- create queue-disabled event notification to watch on 'Service1Queue'


      ON [Service1Queue]


      TO SERVICE [QueueDisabledNotifService], 'current database'



-- define the stored proc to process queue_disabled notifications

CREATE PROCEDURE [QueueDisabledNotifHandler] AS BEGIN

      DECLARE @messageType SYSNAME

      DECLARE @conversationHandle UNIQUEIDENTIFIER

      DECLARE @messageBody XML




            WAITFOR (

                  RECEIVE TOP(1)




                        FROM [QueueDisabledNotifQueue]), TIMEOUT 5000


            IF( @@ROWCOUNT = 0 ) BEGIN

                  COMMIT TRANSACTION -- rollback is inappropriate here as it'll be counted by queue disabling logic for poison message detection


            END -- if( @@rowcount ... )


            IF( @messageType = 'http://schemas.microsoft.com/SQL/Notifications/EventNotification' ) BEGIN

                  DECLARE @cmd NVARCHAR(MAX)

                  SET @cmd = 'dbo.sp_send_dbmail



                                    @body="CAST(@messageBody as NVARCHAR(MAX)",

                                    @subject="Queue Disabled Detected";'

                  EXEC (@cmd)

            END -- if( @messageType ... )


      END -- while(1=1)




-- kick off queue disabled notification processing

ALTER QUEUE [QueueDiabledNotifQueue]


            STATUS = ON,

            PROCEDURE_NAME = QueueDisabledNotifHandler,

            MAX_QUEUE_READERS = 1,

            EXECUTE AS SELF )



Comments (7)
  1. CraigB says:

    It seems like option B is pretty close to impossible when your error leaves you with XACT_STATE() = -1.  What is the recommendation for solving this?  In order to avoid the default poison message handing, is it recommended to simply rollback and immediately end the conversation with error?

  2. Tom says:

    When the transaction becomes doomed the queue is probably going to become disabled due to poisoning, However, this is almost always due to something that the activated proc is doing, such as converting to XML in the RECEIVE statement. So the first thing to do is to fix that.

  3. CraigB:

    I created a ssb_poisoned_conversations table and use it to prevent processing a poisoned message twice. This technique must works while a poisoned message processing take more time than single insert.

    create table [dbo].[ssb_poisoned_conversations](

    [incident] [int] IDENTITY(1,1) NOT NULL,

    [incident_date] [datetime] NOT NULL,

    [conversation_handle] [uniqueidentifier] NOT NULL,

    [error_number] [int] NOT NULL,

    [error_message] [nvarchar](4000) NOT NULL,

    CONSTRAINT [PK_ssb_poisoned_conversations_incident] PRIMARY KEY CLUSTERED


    [incident] ASC



    alter table [dbo].[ssb_poisoned_conversations] ADD  CONSTRAINT [DF_ssb_poisoned_conversations_incident_date]  DEFAULT (getdate()) FOR [incident_date]


    CREATE NONCLUSTERED INDEX [IX_ssb_poisoned_conversations] ON [dbo].[ssb_poisoned_conversations]


    [conversation_handle] ASC



    create procedure [dbo].[TestQueueProcessor]

    with execute as owner as

    set nocount, ansi_nulls, ansi_padding, ansi_warnings, concat_null_yields_null, quoted_identifier, arithabort on;

    set numeric_roundabort, implicit_transactions, xact_abort off;

    declare @message_type_name sysname, @conversation_handle uniqueidentifier, @message_body xml, @error_number int, @error_message nvarchar(4000);

    while exists(select * from [sys].[service_queues] where [name] = N'TestQueue' and [is_receive_enabled] = 1)

    begin try

    begin transaction;

    waitfor(receive top (1) @message_body = [message_body],

    @conversation_handle = [conversation_handle],

    @message_type_name = [message_type_name]

    from [dbo].[TestQueue]), timeout 1000;

    if @@rowcount = 0 begin

    rollback transaction; break;


    — check poisoned conversation

    select top 1 @error_number = [error_number], @error_message = [error_message]

    from dbo.ssb_poisoned_conversations with (updlock)

    where [conversation_handle] = @conversation_handle;

    if @@rowcount > 0 begin

    — if poisoned message detected then remove message from queue – do not process

    commit transaction; continue;


    if @message_type_name = 'http://grominc/test/testmessage&#39; begin

    execute dbo.poison_generator;

    commit transaction; continue;


    execute [dbo].[ssb_DefaultProcessor] @message_type_name, @conversation_handle, @message_body;

    commit transaction; continue;

    end try

    begin catch

    select @error_number = ERROR_NUMBER(), @error_message = ERROR_MESSAGE();

    if (XACT_STATE() = -1)


    rollback transaction;

    insert dbo.ssb_poisoned_conversations([conversation_handle], [error_number], [error_message])

    values (@conversation_handle, @error_number, @error_message);


    if (XACT_STATE() = 1) begin

    end conversation @conversation_handle with error = @error_number

                                                                                                       description = @error_message;

    commit transaction;


    end catch;


  4. In addition to my previous comment:

    This must work always on Microsoft SQL Server 2008 R2 and above when queue created with option POISON_MESSAGE_HANDLING (STATUS = OFF)

  5. Han Folstofe says:

    This example should have never worked, as the activation happens on an impersonated security context you can't access any resounces (like the one in msdb) outside your database.

    At least not without jumping through security hoops 🙂

  6. R.P.Rozema says:

    Option B is certainly feasible, even for actions that may cause a doomed transaction (exact_state() = -1). You need to follow a somewhat different strategy, using a timer message and a table to store the message for as long as it has not been processed yet. This method was presented by Remus Rusanu on his blog. Steps to follow are:

    – start transaction

    – waitfor( receive)

    If a message comes in that you need to process:

    – insert the message received into a table using the conversation handle as the key value

    – start a dialog timer for a few seconds into the future

    – commit the transaction

    – start a new transaction

    – delete the row you just inserted

    – attempt to process the message

    – reply (or do whatever is needed for your message protocol)

    – on success commit the transaction

    – on failure rollback the transaction.

    – continue to the next message

    If a DialogTimer message comes in:

    – use the conversation handle to retrieve the row from the table, place an exclusive lock on the row using with (updlock, holdlock) or use an update statement to retrieve the values into local variables which places an update lock automatically,

    – if no row was found (@@rowcount = 0), commit and continue to next message

    – if row was found delete the row from the table

    – process the message

    – reply (or do whatever is needed for your message protocol)

    – on success commit the transaction

    – on failure rollback the transaction.

    – continue to next message

    Other messages (f.e. EndDialog and Error) are processed as you would do normally.

    This way the poison messages will be removed from the queue, so the process can continue processing all correct messages. And all poison messages will be kept in the table for you to review them. The conversations for these poison messages will even still be valid, so you can decide to modify the messages and have them retried to your liking.

    You can also have the messages retried automatically multiple times by re-setting the timer upon receipt of each DialogTimer message too. The table can be used to retrieve the number of retries attempted already (or the number of retries left). You will have to split the DialogTimer code into 2 transactions, similar to what I've shown above: first update the retry counter on the row, start the timer and commit, then start a new transaction, delete, process and commit.

Comments are closed.

Skip to main content