How to tell if your code should be yielding

As we saw in the last post, cooperative clr code that explicitly yields during long-running computations outperforms clr code that gets forcibly rescheduled.  How should one discover sql/clr code that is considered 'runaway' by the server and could be improved?

There are a number of informative dmvs that contain helpful information in cases like this, such as  sys.dm_clr_tasks, sys.dm_os_waiting_tasks, and sys.dm_os_workers.

For this particular problem, I want to know what state my clr tasks are in and what they have been waiting on recently as well as the number of times that the SQL Server Scheduler has forced them to yield:

select os.task_address, os.state, os.last_wait_type, clr.state, clr.forced_yield_count from sys.dm_os_workers os join sys.dm_clr_tasks clr on (os.task_address = clr.sos_task_address) where clr.type = 'E_TYPE_USER'

When my greedy count query from the last post is running, I notice a number of rows like the following in the output:

0x006D9180

RUNNABLE

SQLCLR_QUANTUM_PUNISHMENT

E_TASK_ATTACHED_TO_CLR

2

0x006D89C0

RUNNABLE

SQLCLR_QUANTUM_PUNISHMENT

E_TASK_ATTACHED_TO_CLR

4

0x008CCCA8

RUNNABLE

SQLCLR_QUANTUM_PUNISHMENT

E_TASK_ATTACHED_TO_CLR

1

0x006D8E98

RUNNABLE

SQLCLR_QUANTUM_PUNISHMENT

E_TASK_ATTACHED_TO_CLR

3

SQLCLR_QUANTUM_PUNISHMENT indicates that the task previously exceeded its allowed quantum, causing the scheduler to intervene and reschedule it at the end of the queue while forced_yield_count shows the number of times that this has happened.

In contrast, when the cooperative version is running, I see rows like the following where the last_wait_type is SOS_SCHEDULER_YIELD and forced_yield_count is 0.

0x008CCAB8

RUNNING

SOS_SCHEDULER_YIELD

E_TASK_ATTACHED_TO_CLR

0

0x006D9278

RUNNABLE

SOS_SCHEDULER_YIELD

E_TASK_ATTACHED_TO_CLR

0

0x006D9088

RUNNABLE

SOS_SCHEDULER_YIELD

E_TASK_ATTACHED_TO_CLR

0

This post is late as I was trying to include more information about all the underlying activity, but it quickly proved too much for a blog post on this simple example.  Hopefully, I and others on the team will be able to be able to go into more detail in future posts.