Asynchronous operations, pinning


One
thing we tried to do with the CLR and FX is provide a consistent asynchronous
programming model.


"urn:schemas-microsoft-com:office:office" /> size=2> 


To style="mso-bidi-font-style: normal">briefly recap the model, an API called
XXX may also offer an async alternative composed of BeginXXX and EndXXX
methods.  Even if the class that
implements XXX doesn’t also offer BeginXXX and EndXXX, you can define a Delegate
class whose signature is consistent with the signature of XXX. style="mso-spacerun: yes">  On that Delegate, you will find BeginXXX
and EndXXX methods defined, which can be used to call the XXX method
asynchronously.


size=2> 


The
BeginXXX method takes the inbound arguments, an optional state object and an
optional callback delegate.  It
returns an implementation of IAsyncResult that can be used to rendezvous with
the completion.


size=2> 


The
managed asynchronous programming model provides a choice of four different ways
to rendezvous with the completion:


size=2> 



  1. style="MARGIN: 0in 0in 0pt; mso-list: l1 level1 lfo1; tab-stops: list .5in"> face=Tahoma size=2>The asynchronous provider calls the delegate callback
    specified in the BeginXXX call, passing the optional state
    object. 

  2. style="MARGIN: 0in 0in 0pt; mso-list: l1 level1 lfo1; tab-stops: list .5in"> face=Tahoma size=2>The initiator polls for completion, using the
    IAsyncResult.IsComplete property. size=2> 

  3. style="MARGIN: 0in 0in 0pt; mso-list: l1 level1 lfo1; tab-stops: list .5in"> face=Tahoma size=2>The initiator waits for an event to be signaled, via
    IAsyncResult.WaitHandle. size=2> 

  4. style="MARGIN: 0in 0in 0pt; mso-list: l1 level1 lfo1; tab-stops: list .5in"> face=Tahoma size=2>The initiator blocks until the asynchronous operation
    completes, by calling the EndXXX API.

size=2> 


Of these
four techniques, the first is by far the most popular and arguably the easiest
for developers to code to.


size=2> 


The
second could be used in a highly scalable server, which can afford a dedicated
thread to routinely poll all outstanding asynchronous operations and process any
that have completed.


size=2> 


The
third technique can be used to process each operation as it completes
(WaitHandle.WaitAny) or to process all operations after the last one completes
(WaitHandle.WaitAll).  Because
WaitHandles are expensive resources, a sophisticated implementation of
IAsyncResult may delay materializing the handle until a client requests it. style="mso-spacerun: yes">  In most cases, the client will select a
different rendezvous method and the WaitHandle is never needed.


size=2> 


The
fourth technique is the hardest to understand. style="mso-spacerun: yes">  Why initiate an operation asynchronously
if you intend to rendezvous with it synchronously? style="mso-spacerun: yes">  But this can make sense if the
application is interleaving a finite amount of synchronous processing with the
asynchronous operation, to reduce latency. 
Once the synchronous processing is complete, it may make sense to
block.


size=2> 


size=2>Regardless of which of these techniques is used to achieve the
rendezvous, the final step of the completion is to call the EndXXX API to
retrieve the return value, any outbound arguments, or possibly an
exception.  If the rendezvous is of
the first form, the EndXXX method is probably called directly out of the
callback.


size=2> 


Once the
EndXXX API returns, the operation is fully complete and the IAsyncResult serves
no further purpose.  Since there may
be significant resources associated with the operation, the IAsyncResult
implementation might treat EndXXX as the equivalent of
IDisposable.Dispose().  For
instance, any materialized WaitHandle can be disposed at this time.


size=2> 


One of
the most common questions related to the managed asynchronous programming model
is whether it’s strictly necessary to call EndXXX. style="mso-spacerun: yes">  If the operation doesn’t have any return
values or outbound arguments, then it’s certainly convenient to “Fire and
Forget.”  However, there are a few
problems with this:


size=2> 


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l2 level1 lfo2; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">1) style="FONT: 7pt 'Times New Roman'">     
If the operation fails, a call to
EndXXX will throw the exception that signals this failure. style="mso-spacerun: yes">  If the application never calls EndXXX,
it has no way of knowing whether the asynchronous operation actually
happened.


size=2> 


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l2 level1 lfo2; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">2) style="FONT: 7pt 'Times New Roman'">     
As we’ve seen, EndXXX is an
opportunity for resources to be eagerly disposed. style="mso-spacerun: yes">  If you don’t call EndXXX, those
resources must be retained until the GC collects the IAsyncResult object and
finalizes it.  On the server, this
can be a significant performance issue.


size=2> 


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l2 level1 lfo2; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">3) style="FONT: 7pt 'Times New Roman'">     
The last time I checked, some of
the FX async APIs would misbehave if EndXXX is not called. style="mso-spacerun: yes">  For example, finalization of a stream
and finalization of any pending IAsyncResult objects are not well ordered. style="mso-spacerun: yes">  Because of the subtlety involved in
efficiently fixing these cases, there’s some debate over whether these are
framework bugs or application bugs.


size=2> 


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l2 level1 lfo2; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">4) style="FONT: 7pt 'Times New Roman'">     
Skipping the EndXXX calls is
sloppy.  This is certainly a matter
of taste, but I consider it a strong argument.


size=2> 


Because
of the above reasons, you should always balance a successful BeginXXX call with
its EndXXX counterpart.


size=2> 


Another
common question has to do with the best way to perform a synchronous operation
asynchronously.  If an API offers
BeginXXX / EndXXX methods, you should use them. style="mso-spacerun: yes">  This is definitely going to be the
technique with the best performance. 
But if you only have an XXX API, you still have several obvious
choices:


size=2> 


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo3; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">1) style="FONT: 7pt 'Times New Roman'">     
Create a new Thread which calls
XXX and then dies.


size=2> 


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo3; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">2) style="FONT: 7pt 'Times New Roman'">     
ThreadPool.QueueUserWorkItem()
allows a client to call XXX on a ThreadPool thread. style="mso-spacerun: yes">  The rendezvous model is similar to the
delegate callback mechanism we already discussed.


size=2> 


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo3; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">3) style="FONT: 7pt 'Times New Roman'">     
Create a Delegate over XXX and
then call the BeginXXX / EndXXX methods on that delegate.


size=2> 


The
first choice is almost never the correct one. style="mso-spacerun: yes">  You should only create a dedicated
thread if you have a long-running use for one, or if your thread must be
different from all the “anonymous” threads in the threadpool. style="mso-spacerun: yes">  (For example, threadpool threads are all
CoInitialized for the MTA.  If you
need an STA thread, you need to create your own thread).


size=2> 


The
second choice will actually perform better than using a Delegate’s BeginXXX /
EndXXX methods.  If you are queueing
work in your own AppDomain, this is the way to go. style="mso-spacerun: yes">  I know that with work we can narrow the
performance gap between QueueUserWorkItem and asynchronous Delegates, but I
don’t think we can ever achieve parity.


size=2> 


If your
application is making asynchronous calls on remote objects, then asynchronous
Delegates have an important optimization. 
They don’t actually switch to a different thread in this case. style="mso-spacerun: yes">  Instead, they synchronously initiate a
remote call from the calling thread and then return. style="mso-spacerun: yes">  Asynchronous Delegates have the
additional benefit of sharing a consistent model with explicit BeginXXX / EndXXX
APIs in FX, so you may prefer them to QueueUserWorkItem for this
reason.


size=2> 


Finally,
a word on pinning.  I often see
applications that aggressively pin managed objects or managed delegates that
have been passed to unmanaged code. 
In many cases, the explicit pin is unnecessary. style="mso-spacerun: yes">  It arises because the developer has
confused the requirement of tracking an object instance via a handle with the
requirement of keeping the bytes of that object at a fixed location in
memory.


size=2> 


For
normal PInvokes, a blittable type exposes the bytes of an object in the GC heap
directly to unmanaged code.  This
obviously means that the bytes mustn’t be moved by a GC relocation until the
unmanaged code has stopped accessing them. 
In most cases, the PInvoke layer can automatically pin the bytes for the
lifetime of the call.  And this
layer can pin those bytes in a more efficient manner than you could with a
pinned GCHandle.  (The PInvoke layer
is hooked into the CLR’s stack crawling mechanism for GC reporting. style="mso-spacerun: yes">  So it can defer all overhead related to
pinning unless a GC actually occurs while the PInvoke call is in progress). style="mso-spacerun: yes">  Applications that explicitly pin buffers
around PInvoke calls are often doing
so unnecessarily.


size=2> 


Along
the same lines, managed Delegates can be marshaled to unmanaged code, where they
are exposed as unmanaged function pointers. style="mso-spacerun: yes">  Calls on those pointers will perform an
unmanaged to managed transition; a change in calling convention; entry into the
correct AppDomain; and any necessary argument marshaling. style="mso-spacerun: yes">  Clearly the unmanaged function pointer
must refer to a fixed address.  It
would be a disaster if the GC were relocating that! style="mso-spacerun: yes">  This leads many applications to create a
pinning handle for the delegate. 
This is completely unnecessary. 
The unmanaged function pointer actually refers to a native code stub that
we dynamically generate to perform the transition & marshaling. style="mso-spacerun: yes">  This stub exists in fixed memory outside
of the GC heap.


size=2> 


size=2>However, the application is
responsible for somehow extending the lifetime of the delegate until no more
calls will occur from unmanaged code. 
The lifetime of the native code stub is directly related to the lifetime
of the delegate.  Once the delegate
is collected, subsequent calls via the unmanaged function pointer will crash or
otherwise corrupt the process.  In
our recent release, we added a Customer Debug Probe which allows you to cleanly
detect this – all too common – bug in your code. style="mso-spacerun: yes">  If you haven’t started using Customer
Debug Probes during development, please take a look!


size=2> 


So there
are lots of places where applications often pin unnecessarily. style="mso-spacerun: yes">  The reason I bring this up is that
asynchronous operations through unmanaged code are an important and legitimate
scenario for pinning.  If you are
passing a buffer or OverlappedStruct out to an asynchronous unmanaged API via a
PInvoke, you had better be pinning that object. style="mso-spacerun: yes">  We have a Customer Debug Probe that
attempts to validate your pinning through some stressful GC and Finalization
calls around the PInvoke call.  But
this sort of race condition is necessarily a hard bug to provoke cleanly, and
the performance impact of this probe is significant.


size=2> 


Whenever
you pin an object like a buffer, you should consider whether the buffer is
naturally long-lived.  If it is not,
consider whether you could build a buffer recycling cache so that the buffers
become long-lived.  This is worth
doing because the cost of a pin in the oldest generation of the GC heap is far
less than the cost of a pin in the youngest generation. style="mso-spacerun: yes">  Objects that have survived into the
oldest generation are rarely considered for collection and they are very rarely
compacted.  Therefore pinning an old
object is often a NOP in terms of its performance impact.


size=2> 


Of
course, if you are calling explicit BeginXXX / EndXXX APIs in FX (like
Stream.BeginRead / EndRead), then the pinning isn’t your concern. style="mso-spacerun: yes">  The Stream implementation is responsible
for ensuring that buffers are fixed if it defers to unmanaged operations that
expect fixed memory locations.


size=2> 


Along
the same lines, if you call explicit BeginXXX / EndXXX APIs, AppDomain unloads
need not concern you.  But if you
call asynchronous unmanaged services directly via PInvoke, you had better be
sure that an AppDomain.Unload doesn’t happen while you have a request in
flight.  If it does, the pinning
handles will be reclaimed as part of the unload. style="mso-spacerun: yes">  This might mean that the asynchronous
operation scribbles into the GC heap where a buffer or OverlappedStruct style="mso-bidi-font-style: normal">used to be. style="mso-spacerun: yes">  The resulting heap corruption puts the
entire process at risk.


size=2> 


There’s
no good story for this in the current product. style="mso-spacerun: yes">  Somehow you must delay the unload until
all your asynchronous operations have drained. style="mso-spacerun: yes">  One way to do this might be to block in
the AppDomain.UnloadDomain event until the count of outstanding operations
returns to 0.  We’ll be making it
easier for you to remain bullet-proof in this sort of scenario in future
versions.


size=2> 


So if
you can find specific FX asynchronous APIs to call, all this nastiness is
handled for you.  If instead you
define your own managed asynchronous APIs over some existing unmanaged
implementation, you need to be very careful.

Comments (18)

  1. So is System.Windows.Forms.Control.BeginInvoke part of this model? This may seem like a stupid question, but actually I suspect the answer is "no".

    There are two forms of Control.BeginInvoke, and neither of them lets you pass the optional callback delegate. (You may ask "what about the first parameter – isn’t that a delegate?" And the answer is "Yes, but that parameter is also there in the synchronous Invoke method – the async BeginXxx method is supposed to add an extra delegate according to this pattern.")

    This is aggravating as it makes it really awkward to call EndInvoke. You can’t do it in the method called by the delegate passed in, because you’ll then be saying "Please block until I return from the method I’m calling you from" – or to put it more succinctly "Please deadlock". So how are you supposed to call EndInvoke? Using a call to Delegate.BeginInvoke? Seems rather tedious…

    So, given that Control.BeginInvoke does not appear to fit into the normal .NET pattern for async invocation, does this mean we’re off the hook with respect to calling EndInvoke?

  2. Chris Brumme says:

    Last night I sent an email to the WinForms folks asking this same question. Like you, I suspect the EndInvoke is optional on Control. (I looked through the code, but that’s no substitute for a statement from the authors). As you say, this API doesn’t quite match the managed async programming model anyway.

    I’ll reply back as soon as I get official word.

  3. Dmitriy Zaslavskiy says:

    Chris could you provide some examples of unnecessary pinning.

  4. Chris Brumme says:

    I’ve seen a lot of people pinning Delegate instances, as I described above. I’m not aware of a circumstance where this makes sense.

    I’ve also seen apps explicitly pinning managed arrays of primitives like char arrays and int arrays, before making PInvoke calls. The same thing with String and StringBuilder objects. If the unmanaged callee only needs access to the buffer for the life of the call, then the PInvoke marshaling layer will generally pin it for that duration. So this can be a second example of unnecessary pinning.

  5. Manoj says:

    One of the issues with asynchronous execution is how impersonation works.

    Client request comes in, gets queued for async execution, worker thread picks up the request, now worker thread need to impersonate the client…

  6. Dmitriy Zaslavskiy says:

    Chris thanks, for your answer. But those examples where shown in tones of samples. What about class(es) containing arrays, will those get pinned as well?

  7. Chris Brumme says:

    Dmitriy,

    Generally, PInvoke will either copy your data to fixed memory outside the GC heap, or it will pin memory in the GC heap and expose those bytes directly to unmanaged code. In either case, you don’t need to explicitly pin — so long as access to these bytes is scoped to within the duration of the PInvoke call.

  8. Chris Brumme says:

    Manoj,

    I will be posting a new blog later this evening which discusses impersonation (a little) and transference of CAS information across async points (a lot).

  9. Dmitriy Zaslavskiy says:

    Thanks Chris.
    I think I got it now.

  10. Chris Brumme says:

    I just got the official word from the WinForms team. It is not necessary to call Control.EndInvoke. You can call BeginInvoke in a "fire and forget" manner with impunity.

  11. Sharpie says:

    Regarding the advice to call EndInvoke on all delegates or face potential resource leakage, I believe this was poorly conveyed in the docs until 1.1, and even then only barely mentioned.

    Be that as it may, can you not get around this limitation for Fire and Forget by using the [OneWay] attribute on your delegate? I believe that attribute optimizes away the baggage of a fire and forget async delegate by preventing the runtime from having to carry around the IAsyncResult, the potential exception instance and the state object. So in short it kind of eliminates not only the concern of leakage but most of what it would leak anyway, and one would expect that if you had numerous such delegates, you’d be putting less strain on the resources. Presumably these are for operations for which don’t throw exceptions or for which you don’t care about their end result.

    Incidentally, if that doesn’t solve it and you still want to program with a fire and forget model, Mike Woodard of Developmentor implemented a smart class that takes care of calling EndInvoke for you. It is not the most performant thing but in small doses it could solve your fire and forget blues.

    But something tells me that [OneWay] should take care of that for you, no?

  12. In the Windows Communication Foundation, there are a few classes that are so fundamental that they are…

  13. There are only a few things that can make a .NET process crash.  The most common one is an Unhandled

  14. GC Handle provides facilities to explicitly control and monitor the lifetime of an object in heap.Whenever