The PDC and Application Compatibility, but still no Hosting


The PDC has happened, which means two things.  I
can post some of my (slightly self-censored) reactions to the show, and I can talk
about what we ve disclosed about Whidbey and Longhorn more freely.  In
this particular case, I had promised to talk about the deep changes we re making
in Whidbey to allow you to host the CLR in your process.  As
you ll see, I got side tracked and ended up discussing Application Compatibility
instead.


 

But first, my impressions of the PDC:


 

The first keynote, with Bill, Jim
& Longhorn, was guaranteed to be good.  It had all the coolness of Avalon,
WinFS and Indigo, so of course it was impressive.  In fact, throughout all the
sessions I attended, I was surprised by the apparent polish
and maturity of Longhorn.  In my opinion, Avalon looked like it is the most mature
and settled.  Indigo also looked surprisingly real.  WinFS looked good in
the keynote, where it was all about the justification for the technology.  But
in the drill-down sessions, I had the sense that it s not as far along as the others.



 

Hopefully all the attendees realize
that Longhorn is still a long way off.  It
s hard to see from the demos, but a lot of fundamental design issues and huge missing
pieces remain.



 

Incidentally, I still can t believe
that we picked WinFX to describe the extended managed frameworks and WinFS to describe
the new storage system.  One of those
names has got to go.


 

I was worried that the Whidbey keynote
on Tuesday would appear mundane and old-fashioned by comparison.  But to an audience
of developers, Eric’s keynote looked very good indeed.  Visual Studio looked
better than I’ve ever seen it.  The device app was so easy to write that I feel
I could build a FedEx-style package tracking application in a weekend. 
The

high
point


of this keynote was ASP.NET.  I hadn’t been paying attention to what they’ve
done recently, so I was blown away by the personalization system and by the user-customizable
web pages.  If I had seen a site like that, I would have assumed the author spent
weeks getting it to work properly.  It
s hard to believe this can all be done with drag-and-drop.



 

In V1, ASP.NET hit a home run by focusing
like a laser beam on the developer experience.  Everyone put so much effort into
building apps, questioning why each step was necessary, and refining the process. 
It’s great to see that they continue to follow that same discipline.  In the
drill-down sessions, over and over again I saw that focus resulting in a near perfect
experience for developers.  There are
some other teams, like Avalon, that seem to have a similar religion and are obtaining
similar results.  (Though Avalon desperately
needs some tools support.  Notepad is
fine for authoring XAML in demos, but I wouldn t want to build a real application
this way).



 

Compared to ASP.NET, some other teams
at Microsoft are still living in the Stone Age.  Those
teams are still on a traditional cycle of building features, waiting for customers
to build applications with those features, and then incorporating any feedback.  Beta
is way too late to find out that the programming model is clumsy.  We
shouldn t be shirking our design responsibilities like this.



 

Anyway, the 3rd keynote (from Rick
Rashid & Microsoft Research) should have pulled it all together.  I think
the clear message should have been something like:



 

Whidbey
is coming next and has great developer features.  After that, Longhorn will arrive
and will change everything.  Fortunately, Microsoft Research is looking 10+ years
out, so you can be sure we will increasingly drive the whole industry.



 

This should have been an easy story
to tell.  The fact is that MSR is a world class research institution.  Browse
the Projects, Topics or People categories at
http://research.microsoft.com and
you ll see many name brand researchers like Butler Lampson and Jim Gray.  You
will see tremendous breadth on the areas under research, from pure math and algorithms
to speech, graphics and natural language.  There
are even some esoterica like nanotech and quantum computing.  We
should have used the number of published papers and other measurements to compare
MSR with other research groups in the software industry, and with major research universities.  And
then we should have shown some whiz-bang demos of about 2 minutes each.



 

Unfortunately, I think instead we
sent a message that Interesting technology comes from Microsoft product groups,
while MSR is largely irrelevant.   Yet
nothing could be further from the truth.  Even
if I restrict consideration to the CLR, MSR has had a big impact.  Generics
is one of the biggest feature added to the CLR, C# or the base Frameworks in Whidbey.  This
feature was added to the CLR by MSR team members, who now know at least as much about
our code base as we do.  All the CLR
s plans for significantly improved code quality and portable compilers depend on a
joint venture between MSR and the compiler teams.  To
my knowledge, MSR has used the CLR to experiment with fun things like transparent
distribution, reorganizing objects based on locality, techniques for avoiding security
stack crawls, interesting approaches to concurrency, and more.  SPOT
(Smart Object Personal Technology) is a wonderful example of what MSR has done with
the CLR s basic IL and metadata design, eventually leading to a very cool product.



 

In my opinion, Microsoft Research
strikes a great balance between long term speculative experimentation and medium term
product-oriented improvements.  I wish
this had come across better at the PDC.



 



 

Trends

In the 6+ years I ve been at Microsoft,
we ve had 4 PDCs.  This is the first
one I ve actually attended, because I usually have overdue work items or too many
bugs.  (I ve missed all 6 of our mandatory
company meetings for the same reason).  So
I really don t have a basis for comparison.



 

I guess I had expected to be beaten
up about all the security issues of the last year, like Slammer and Blaster. 
And I had expected developers to be interested in all aspects of security.  Instead,
the only times the topic came up in my discussions is when I raised it.



 

However, some of my co-workers did
see a distinct change in the level of interest in security.  For
example, Sebastian Lange and Ivan Medvedev gave a talk on managed security to an audience
of 700-800.  They reported a real upswing
in awareness and knowledge on the part of all PDC attendees.



 

But consider a talk I attended on
Application Compatibility.  At a time
when most talks were overflowing into the hallways, this talk filled less than 50
seats of a 500 to 1000 seat meeting room.  I
know that AppCompat is critically important to IT.  And
it s a source of friction for the entire industry, since everyone is reluctant to
upgrade for fear of breaking something.  But
for most developers this is all so boring compared to the cool visual effects we can
achieve with a few lines of XAML.



 

Despite a trend to increased interest
in security on the part of developers, I suspect that security remains more of an
IT operations concern than it does a developer concern.  And
although the events of the last year or two have got more developers excited about
security (including me!), I doubt that we will ever get developers excited about more
mundane topics like versioning, admin or compatibility.  This
latter stuff is dead boring.



 

That doesn t mean that the industry
is doomed.  Instead, it means that modern
applications must obtain strong versioning, compatibility and security guarantees
by default rather than through deep developer involvement.  Fortunately,
this is entirely in keeping with our long term goals for managed code.



 

With the first release of the CLR,
the guarantees for managed applications were quite limited.  We
guaranteed memory safety through an accurate garbage collector, type safety through
verification, binding safety through strong names, and security through CAS.  (However,
I think we would all agree that our current support for CAS still involves far too
much developer effort and not enough automated guarantees.  Our
security team has some great long-term ideas for addressing this.)



 

More importantly, we expressed programs
through metadata and IL, so that we could expand the benefits of reasoning about these
programs over time.  And we provided metadata
extensibility in the form of Custom Attributes and Custom Signature Modifiers, so
that others could add to the capabilities of the managed environment without depending
on the CLR team s schedule.



 

FxCop (http://www.gotdotnet.com/team/fxcop/)
is an obvious example of how we can benefit from this ability to reason about programs.  All
teams developing managed code at Microsoft are religious about incorporating this
tool into their build process.  And since
FxCop supports adding custom rules, we have added a large number of Microsoft-specific
or product-specific checks.



 



 

Churn and Application Breakage

We also have some internal tools that
allow us to compare different versions of assemblies so we can discover inadvertent
breaking changes.  Frankly, these tools
are still maturing.  Even in the

Everett

timeframe, they did a good job of blatant violations like the removal of a public
method from a class or addition of a method to an interface.  But
they didn t catch changes in serialization format, or changes to representation after
marshaling through PInvoke or COM Interop.  As
a result, we shipped some unintentional breaking changes in

Everett

, and until recently we were on a path to do so again in Whidbey.



 

As far as I know, these tools still
don t track changes to CAS constructs, internal dependency graphs, thread-safety
expectations, exception flow (including a static replacement for the checked exceptions
feature), reliability contracts, or other aspects of execution.  Some
of these checks will probably be added over time, perhaps by adding additional metadata
to assemblies to reveal the developer s intentions and to make automated validation
more tractable.  Other checks seem like
research projects or are more appropriate for dynamic tools rather than static tools.  It
s very encouraging to see teams inside and outside of Microsoft working on this.



 

I expect that all developers will
eventually have access to these or similar tools from Microsoft or 3rd parties,
which can be incorporated into our build processes the way FxCop has been.



 

Sometimes applications break when
their dependencies are upgraded to new versions.  The
classic example of this is Win95 applications which broke when the operating system
was upgraded to WinXP.  Sometimes this
is because the new versions have made breaking changes to APIs.  But
sometimes it s because things are just different .  The
classic case here is where a test case runs perfectly on a developer s machine, but
fails intermittently in the test lab or out in the field.  The
difference in environment might be obvious, like a single processor box vs. an 8-way.  Yet
all too often it s something truly subtle, like a DLL relocating when it misses its
preferred address, or the order of DllMain notifications on a DLL_THREAD_ATTACH.  In
those cases, the change in environment is not the culprit.  Instead,
the environmental change has finally revealed an underlying bug or fragility in the
application that may have been lying dormant for years.



 

The managed environment eliminates
a number of common fragilities, like the double-free of memory blocks or the use of
a file handle or Event that has already been closed.  But
it certainly doesn t guarantee that a multi-threaded program which appears to run
correctly on a single processor will also execute without race conditions on a 32-way
NUMA box.  The author of the program must
use techniques like code reviews, proof tools and stress testing to ensure that his
code is thread-safe.



 

The situation that worries me the most is when an application
relies on accidents of current FX and CLR implementations.  These
dependencies can be exceedingly subtle.


 

Here are some examples of breakage that we have encountered,
listed in the random order they occur to me:


 

  1. Between V1.1 and Whidbey, the implementation of reflection
    has undergone a major overhaul to improve access times and memory footprint.  One
    consequence is that the order of members returned from APIs like Type.GetMethods has
    changed.  The old order was never documented
    or guaranteed, but we ve found programs including our own tests which assumed
    stability here.


 

  1. Structs and classes can specify Sequential, Explicit
    or AutoLayout.  In the case of AutoLayout,
    the CLR is free to place members in any order it chooses.  Except
    for alignment packing and the way we chunk our GC references, our layout here is currently
    quite predictable.  But in the future
    we hope to use access patterns to guide our layout for increased locality.  Any
    applications that predict the layout of AutoLayout structs and classes via unsafe
    coding techniques are at risk if we pursue that optimization.


 

  1. Today, finalization occurs on a single Finalizer thread.  For
    scalability and robustness reasons, this is likely to change at some point.  Also,
    the GC already perturbs the order of finalization.  For
    instance, a collection can cause a generation boundary to intervene between two instances
    that are normally allocated consecutively.  Within
    a given process run, there will likely be some variation in finalization sequence.  But
    for two objects that are allocated consecutively by a single thread, there is a high
    likelihood of predictable ordering.  And
    we all know how easy it is to make assumptions about this sort of thing in our code.


 

  1. In an earlier blog (http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/e55664b4-6471-48b9-b360-f0fa27ab6cc0),
    I talked about some of the circumstances that impact when the JIT will stop reporting
    a reference to the GC.  These include
    inlining decisions, register allocation, and obvious differences like X86 vs. AMD64
    vs. IA64.  Clearly we want the freedom
    to chase better code quality with JIT compilers and NGEN compilers in ways that will
    substantially change these factors.  Just
    yesterday an internal team reported a GC bug on multi-processor machines only
    that we quickly traced to confusion over lifetime rules and bad practice in the application.  One
    finalizable object was accessing some state in another finalizable object, in the
    expectation that the first object was live because it was the this argument
    of an active method call.


 

  1. During V1.1 Beta testing, a customer complained about
    an application we had broken.  This application
    contained unmanaged code that reached back into its caller s stack to retrieve a
    GCHandle value at an offset that had been empirically discovered.  The
    unmanaged code then transitioned into managed and redeemed the supposed handle value
    for the object it referenced.  This usually
    worked, though it was clearly dependent on filthy implementation details.  Unfortunately,
    the System.EnterpriseServices pathways leading to the unmanaged application were somewhat
    variable.  Under certain circumstances,
    the stack was not what the unmanaged code predicted.  In
    V1, the value at the predicted spot was always a 0 and the redemption attempt failed
    cleanly.  In V1.1, the value at that stack
    location was an unrelated garbage value.  The
    consequence was a crash inside mscorwks.dll and Fail Fast termination of the process.


 

  1. In V1 and V1.1, Object.GetHashCode() can be used to obtain
    a hashcode for any object.  However, our
    implementation happened to return values which tended to be small ascending integers.  Furthermore,
    these values happened to be unique across all reachable instances that were hashed
    in this manner.  In other words, these
    values were really object identifiers or OIDs.  Unfortunately,
    this implementation was a scalability killer for server applications running on multi-processor
    boxes.  So in Whidbey Object.GetHashCode()
    is now all we ever promised it would be: an integer with reasonable distribution but
    no uniqueness guarantees.  It s a great
    value for use in HashTables, but it s sure to disappoint some existing managed applications
    that relied on uniqueness.


 

  1. In V1 and V1.1, all string literals are Interned as described
    in
    http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/7943b9be-cca9-41e1-8a83-3d7a0dbba270.  I
    noted there that it is a mistake to depend on Interning across assemblies.  That
    s because the other assembly might start to compose a String value which it originally
    specified as a literal.  In Whidbey, assemblies
    can opt-in or opt-out of our Interning behavior.  This
    new freedom is motivated by a desire to support faster loading of assemblies (particularly
    assemblies that have been NGEN ed).  We
    ve seen some tests fail as a result.


 

  1. I ve seen some external developers use a very fragile
    technique based on their examination of Rotor sources.  They
    navigate through one of System.Threading.Thread s private fields (DONT_USE_InternalThread)
    to an internal unmanaged CLR data structure that represents a running managed thread.  From
    there, they can pluck interesting information like the Thread::ThreadState bit field.  None
    of these data structures are part of our contract with managed applications and all
    of them are sure to change in future releases.  The
    only reason the ThreadState field is at a stable offset in our internal Thread struct
    today is that its frequency of access merits putting it near the top of the struct
    for good cache-line filling behavior.


 

  1. Reflection allows highly privileged code to access private
    members of arbitrary types.  I am aware
    of dozens of teams inside and outside of Microsoft which rely on this mechanism for
    shipping products.  Some of these uses
    are entirely justified, like the way Serialization accesses private state that the
    type author marked as [Serializable()].  Many
    other uses rather questionable, and a few are truly heinous.  Taken
    to the extreme, this technique converts every internal implementation detail into
    a publicly exposed API, with the obvious consequences for evolution and application
    compatibility.


 

  1. Assembly loading and type resolution can happen on very
    different schedules, depending on how your application is running.  We
    ve seen applications that misbehave based on NGEN vs. JIT, domain-neutral vs. per-domain
    loading, and the degree to which the JIT inlines methods.  For
    example, one application created an AppDomain and started running code in it.  That
    code subsequently modified the private application directory and then attempted to
    load an assembly from that directory.  Of
    course, because of inlining the JIT had already attempted to load the assembly with
    the original application directory and had failed.  The
    correct solution here is to disallow any changes to an AppDomain s application directory
    after code starts executing inside that AppDomain.  This
    directory should only be modifiable during the initialization of the AppDomain.


 

  1. In prior blogs, I ve talked about unhandled exceptions
    and the CLR s default policy for dealing with them.  That
    policy is quite involved and hard to defend.  One
    aspect of it is that exceptions that escape the Finalizer thread or any ThreadPool
    threads are swallowed.  This keeps the
    process running, but it often leaves the application in an inconsistent state.  For
    example, locks may not have been released by the thread that took the exception, leading
    to subsequent hangs.  Now that the technology
    for reporting process crashes via Watson dumps is maturing, we really want to change
    our default policy for unhandled exceptions so that we Fail Fast with a process crash
    and a Watson upload.  However, any change
    to this policy will undoubtedly cause many existing applications to stop working.


 

  1. Despite the flexibility of CAS, most applications still
    run with Full Trust.  I truly believe
    that this will change over time.  For
    example, in Whidbey we will have ClickOnce permission elevation and in Longhorn we
    will deliver the Secure Execution Environment or SEE.  Both
    of these features were discussed at the PDC.  When
    we have substantial code executing in partial trust, we re going to see some unfortunate
    surprises.  For example, consider message
    pumping.  If a Single Threaded Apartment
    thread has some partial trust code on its stack when it blocks (e.g. Monitor.Enter
    on a contentious monitor), then we will pump messages on that thread while it is blocked.  If
    the dispatching of a message requires a stack walk to satisfy a security Full Demand,
    then the partially trusted code further back on the stack may trigger a security exception.  Another
    example is related to class constructors.  As
    you probably know, .cctor methods execute on the first thread that needs access to
    a class in a particular AppDomain.  If
    the .cctor must satisfy a security demand, the success of the .cctor now depends on
    the accident of what other code is active on the thread s stack.  Along
    the same lines, the .cctor method may fail if there is insufficient stack space left
    on the thread that happens to execute it.  These
    are all well understood problems and we have plans for fixing them.  But
    the fixes will necessarily change observable behavior for a class of applications.


 

I could fill a lot more pages with this sort of list.  And
our platform is still in its infancy.  Anyway,
one clear message from all this is that things will change and then applications will
break.


 

But can we categorize these failures and make some sense
of it all?  For each failure, we need
to decide whether the platform or the application is at fault for each case.  And
then we need to identify some rules or mechanisms that can avoid these failures or
mitigate them.  I see four categories.


 


 

Category
1:  The application explicitly screws
itself

The easiest category to dispense with is the one where
a developer intentionally and explicitly takes advantage of a behavior that s/he knows
is guaranteed to change.  A perfect example
of this is #8 above.  Anyone who navigates
through private members to unmanaged internal data structures is setting himself up
for problems in future versions.  The
responsibility (or irresponsibility in this case) lies with the application.  In
my opinion, the platform should have no obligations.


 

But consider #5 above.  It
s clearly in this same category, and yet opinions on our larger team were quite divided
on whether we needed to fix the problem.   I
spoke to a number of people who definitely understood the incredible difficulty of
keeping this application running on new versions of the CLR and EnterpriseServices.  But
they consistently argued that the operating system has traditionally held itself to
this sort of compatibility bar, that this is one of the reasons for Windows ubiquity,
and that the managed platform must similarly step up.


 

Also, we have to be realistic here.  If
a customer issue like this involves one of our largest accounts, or has been escalated
through a very senior executive (a surprising number seem to reach Steve Ballmer),
then we re going to pull out all the stops on a fix or a temporary workaround.


 

In many cases, our side-by-side support is an adequate
and simple solution.  Customers can continue
to run problematic applications on their old bits, even though a new version of these
bits has also been installed.  For instance,
the config file for an application can specify an old version of the CLR.  Or
binding redirects could roll back a specific assembly.  But
this technique falls apart if the application is actually an add-in that is dynamically
loaded into a process like Internet Explorer or SQL Server.  It
s unrealistic to lock back the entire managed stack inside Internet Explorer (possibly
preventing newer applications that use generics or other Whidbey features from running
there), just so older questionable applications can keep running.


 

It s possible that we could provide lock back at finer-grained
scopes than the process scope in future versions of the CLR.  Indeed,
this is one of the areas being explored by our versioning team.


 

Anyway, if we were under sufficient pressure I could
imagine us building a one-time QFE (patch) for an important customer in this category,
to help them transition to a newer version and more maintainable programming techniques.  But
if you aren t a Fortune 100 company or Steve Ballmer s brother-in-law, I personally
hope we would be allowed to ignore any of your applications that are in this category.


 


 

Category
2:  The platform explicitly screws the
application

I would put #6, #7 and #11 above in a separate category.  Here,
the platform team wants to make an intentional breaking change for some valid reason
like performance or reliability.  In fact,
#10 above is a very special case of this category.  In
#10, we would like to break compatibility in Whidbey so that we can provide a stronger
model that can avoid subsequent compatibility breakage.  It
s a paradoxical notion that we should break compatibility now so we can increase future
compatibility, but the approach really is sensible.


 

Anyway, if the platform makes a conscious decision to
break compatibility to achieve some greater goal, then the platform is responsible
for mitigation.  At a minimum, we should
provide a way for broken applications to obtain the old behavior, at least for some
transition period.  We have a few choices
in how to do this, and we re likely to pick one based on engineering feasibility,
the impact of a breakage, the likelihood of a breakage, and schedule pressure:


 

  • Rely on side-by-side and explicit administrator intervention.  In
    other words, the admin notices the application no longer works after a platform upgrade,
    so s/he authors a config file to lock the application back to the old platform bits.  This
    approach is problematic because it requires a human being to diagnose a problem and
    intervene.  Also, it has the problems
    I already mentioned with using side-by-side on processes like Internet Explorer or
    SQL Server.


 

  • For some changes, it shouldn t be necessary to lock
    back the entire platform stack.  Indeed,
    for many changes the platform could simultaneously support the old and new behaviors.  If
    we change our default policy for dealing with unhandled exceptions, we should definitely
    retain the old policy& at least for one release cycle.


 

  • If we expect a significant percentage of applications
    to break when we make a change, we should consider an opt-in policy for that change.  This
    eliminates the breakage and the human involvement in a fix.  In
    the case of String Interning, we require each assembly to opt-in to the new non-intern
    ed behavior.


 

  • In some cases, we ve toyed with the idea of having the
    opt-in be implicit with a recompile.  The
    logic here is that when an application is recompiled against new platform bits, it
    is presumably also tested against those new bits.  The
    developer, rather than the admin, will deal with any compatibility issues that arise.  We
    re well set up for this, since managed assemblies contain metadata giving us the version
    numbers of the CLR and the dependent assemblies they were compiled against.  Unfortunately,
    execution models like ASP.NET work against us here.  As
    you know, ASP.NET pages are recompiled automatically by the system based on dependency
    changes.  There is no developer available
    when this happens.


 


 

Windows
Shimming

Before we look at the next two categories of AppCompat
failure, it s worth taking a very quick look at one of the techniques that the operating
system has traditionally used to deal with these issues.  Windows
has an AppCompat team which has built something called a shimming engine.


 

Consider what happened when the company tried to move
consumers from Win95/Win98/WinMe over to WinXP.  They
discovered a large number of programs which used the GetVersion or the preferred GetVersionEx
APIs in such a way that the programs refused to run on NT-based systems.


 

In fact, WinXP did such a good job of achieving compatibility
with Win9X systems that in many cases the only reason
the application wouldn t run was the version check that the program made at start
up.  The fix was to change GetVersion
or GetVersionEx to lie about the version number of the current operating system.  Of
course, this lie should only be told to programs that need the lie in order to work
properly.


 

I ve heard that this shim which lies about the operating
system version is the most commonly applied shim we have.  As
I understand it, at process launch the shimming engine tries to match the current
process against any entries in its database.  This
match could be based on the name, timestamp or size of the EXE, or of other files
found relative to that EXE like a BMP for the splash screen in a subdirectory.  The
entry in the database lists any shims that should be applied to the process, like
the one that lies about the version.  The
shimming engine typically bashes the IAT (import address table) of a DLL or EXE in
the process, so that its imports are bound to the shim rather than to the normal export
(e.g. Kernel32!GetVersionEx).  In addition,
the shimming engine has other tricks it perform less frequently, like wrapping COM
objects up with intercepting proxies.


 

It s easy to see how this infrastructure can allow applications
for Win95 to execute on WinXP.  However,
this approach has some drawbacks.  First,
it s rather labor-intensive.  Someone
has to debug the application, determine which shims will fix it, and then craft some
suitable matching criteria that will identify this application in the shimming database.  If
an appropriate shim doesn t already exist, it must be built.


 

In the best case, the application has some commercial
significance and Microsoft has done all the testing and shimming.  But
if the application is a line of business application that was created in a particular
company s IT department, Microsoft will never get its hands on it.  I
ve heard we re now allowing sophisticated IT departments to set up their own shimming
databases for their own applications but this only allows them to apply existing
shims to their applications.


 

And from my skewed point of view the worst part of
all this is that it really won t work for managed applications.  For
managed apps, binding is achieved through strong names, Fusion and the CLR loader.  Binding
is practically never achieved through DLL imports.


 

So it s instructive to look at some of the techniques
the operating system has traditionally used.  But
those techniques don t necessarily apply directly to our new problems.


 

Anyway, back to our categories&


 


 

Category
3:  The application accidentally screws
itself

Category
4:  The platform accidentally screws the
application

Frankly, I m having trouble distinguishing these two
cases.  They are clearly distinct categories,
but it s a judgment call where to draw the line.  The
common theme here is that the platform has accidentally exposed some consistent behavior
which is not actually a guaranteed contract.  The
application implicitly acquires a dependency on this consistent behavior, and is broken
when the consistency is later lost.


 

In the nirvana of some future fully managed execution
environment, the platform and tools would never expose consistent behavior unless
it was part of a guarantee.  Let s look
at some examples and see how practical this is.


 

In example #1 above, reflection used to deliver members
in a stable order.  In Whidbey, that order
changes.  In hindsight, there s a simple
solution here.  V1 of the product could
have contained a testing mode that randomized the returned order.  This
would have exposed the developer to our actual guarantees, rather than to a stronger
accidental consistency.  Within the CLR,
we ve used this sort of technique to force us down code paths that otherwise wouldn
t be exercised.  For example, developers
on the CLR team all use NT-based (Unicode) systems and avoid Win9X (Ansi) systems.  So
our Win9X Ansi/Unicode wrappers wouldn t typically get tested by developers.  To
address this, our checked/debug CLR build originally considered the day of the week
and used Ansi code paths every other day.  But
imagine chasing a bug at
11:55 PM
.  When the bug magically disappears on
your next run at
1:03 AM
the next morning, you are far too frazzled to think clearly about the reason.  Today,
we tend to use low order bits in the size of an image like mscorwks.dll or the assembly
being tested, so our randomization is now more friendly to testing.


 

In example #2 above, you could imagine a similar perturbation
on our AutoLayout algorithms when executing a debug version of an application, or
when launched from inside a tool like Visual Studio.


 

For example #4, the CLR already has internal stress modes
that force different and aggressive GC schedules.  These
can guarantee compaction to increase the likelihood of detecting stale references.  They
can perform extensive checks of the integrity of the heap, to ensure that the write
barrier and other mechanisms are effective.  And
they can ensure that every instruction of JITted managed code that can synchronize
with the GC will synchronize with the GC.  I
suspect that these modes would do a partial job of eradicating assumptions about lifetimes
reported by the JIT.  However, we will
remain exposed to significantly different code generators (like Rotor s FJIT) or
execution on significantly different architectures (like CPUs with dramatically more
registers).


 

In contrast with the above difficulty, it s easy to
imagine adding a new GC stress mode that perturbs the finalization queues, to uncover
any hidden assumptions about finalization order.  This
would address example #3.


 


 

Customer Debug Probes, AppVerifier and other
tools

It turns out that the CLR already has a partial mechanism
for enabling perturbation during testing and removing it on deployed applications.  This
mechanism is the Customer Debug Probes feature that we shipped in V1.1.  Adam
Nathan s excellent blog site has a series of articles on CDPs, which are collected
together at
http://blogs.gotdotnet.com/anathan/CategoryView.aspx/Debugging.  The
original goal of CDPs was to counteract the black box nature of debugging certain
failures of managed applications, like corruptions of the GC heap or crashes due to
incorrect marshaling directives.  These
probes can automatically diagnose common application errors, like failing to keep
a marshaled delegate rooted so it won t be collected.  This
approach is so much easier than wading through dynamically generated code without
symbols, because we tell you exactly where your bugs are.  But
we re now realizing that we can also use CDPs to increase the future compatibility
of managed applications if we can perturb current behavior that is likely to change
in the future.

Unfortunately, example #6 from above reveals a major
drawback with the technique of perturbation.  When
we built the original implementation of Object.GetHashCode, we simply never considered
the difference between what we wanted to guarantee (hashing) and what we actually
delivered (OIDs).  In hindsight, it is
obvious.  But I m not convinced that
we aren t falling into similar traps in our new features.  We
might be a little smarter than we were five years ago, but only a little.


 

Example #10 worries me for similar reasons.  I
just don t think we were smart enough to predict that changing the binding configuration
of an AppDomain after starting to execute code in that AppDomain would be so fragile.  When
a developer delivers a feature, s/he needs to consider security, thread-safety, programming
model, key invariants of the code base like GC reporting, correctness, and so many
other aspects.  It would be amazing if
a developer consistently nailed each of these aspects for every new feature.  We
re kidding ourselves if we think that evolution and unintentional implicit contracts
will get adequate developer attention on every new feature.


 

Even if we had perfect foresight and sufficient resources
to add perturbation for all operations, we would still have a major problem.  We
can t necessarily rely on 3rd party developers to test their applications
with perturbation enabled.  Consider the
unmanaged AppVerifier experience.


 

The operating system has traditionally offered a dynamic
testing tool called AppVerifier which can diagnose many common unmanaged application
bugs.  For example, thanks to uploads
of Watson process dumps from the field, most unmanaged application crashes can now
be attributed to incorrect usage of dynamically allocated memory.  Yet
AppVerifier can use techniques like placing each allocation in its own page or leaving
pages unmapped after release, to deterministically catch overruns, double frees, and
reads or writes of freed memory.


 

In other words, there is hard evidence that if every
unmanaged application had just used the memory checking support of AppVerifier, then
two out of every three application crashes would be eliminated.  Clearly
this didn t happen.


 

Of course, AppVerifier can diagnose far more than just
memory problems.  And it s very easy
and convenient to use.


 

Since testing with AppVerifier is part of the Windows
Logo compliance program, you would expect that it s used fairly rigorously by ISVs.  And,
given its utility, you would expect that most IT organizations would use this tool
for their internal applications.  Unfortunately,
this isn t the case.  Many applications
submitted for the Windows Logo actually fail to launch under AppVerifier.  In
other words, they violate at least one of the rules before they finish initializing.


 

The Windows AppCompat team recognizes that proactive
tools like AppVerifier are so much better than reactive mitigation like shimming broken
applications out in the field.  That
s why they made the AppVerifier tool a major focus of their poorly attended Application
Compatibility talk that I sat in on at the PDC.  (Aha!  I
really was going somewhere with all this.)


 

There s got to be a reason why developers don t use
such a valuable tool.  In my opinion,
the reason is that AppVerifier is not integrated into Visual Studio.  If
the Debug Properties in VS allowed you to enable AppVerifier and CDP checks, we would
have much better uptake.  And if an integrated
project system and test system could monitor code coverage numbers, and suggest particular
test runs with particular probes enabled, we would be approaching nirvana.


 


 

Winding Down

Looking at development within Microsoft, one trend is
very clear:  Automated tools and processes
are a wonderful supplement for human developers.  Whether
we re talking about security, reliability, performance, application compatibility
or any other measure of software quality, we re now seeing that static and dynamic
analysis tools can give us guarantees that we will never obtain from human beings.  Bill
Gates touched on this during his PDC keynote, when he described our new tools for
statically verifying device driver correctness, for some definition of correctness.


 

This trend was very clear to me during the weeks I spent
on the DCOM / RPCSS security fire drill.  I
spent days looking at some clever marshaling code, eventually satisfying myself that
it worked perfectly.  Then someone else
wrote an automated attacker and discovered real flaws in just a few hours.  Other
architects and senior developers scrutinized different sections of the code.  Then
some researchers from MSR who are focused on automatic program validation ran their
latest tools over the same code and gave us step-by-step execution models that led
up to crashes.  Towards the end of the
fire drill, a virtuous cycle was established.  The
code reviewers noticed new categories of vulnerabilities.  Then
the researchers tried to evolve their tools to detect those vulnerabilities.  Aspects
of this process were very raw, so the tools sometimes produced a great deal of noise
in the form of false positives.  But it
s clear that we were getting real value from Day One and the future potential here
is enormous.


 

One question that always comes up, when we talk about
adding significant value to Visual Studio through additional tools, is whether Microsoft
should give away these tools.  It s a
contentious issue, and I find myself going backwards and forwards on it.  One
school of thought says that we should give away tools to promote the platform and
improve all the programs in the Windows ecology.  In
the case of tools that make our customers applications more secure or more resilient
to future changes in the platform, this is a compelling argument.  Another
school of thought says that Visual Studio is a profit center like any other part of
the company, and it needs the freedom to charge what the market will bear.


 

Given that my job is building a platform, you might expect
me to favor giving away Visual Studio.  But
I actually think the profit motive is a powerful mechanism for making our tools competitive.  If
Visual Studio doesn t have P&L responsibility, their offering will deteriorate
over time.  The best way to know whether
they ve done all they can to make the best tools possible, is to measure how much
their customers are willing to pay.  I
want Borland to compete with Microsoft on building the best tools at the best price,
and I want to be able to measure the results of that competition through revenue and
market penetration.


 

In all this, I have avoided really talking about the
issues of versioning.  Of course, versioning
and application compatibility are enormously intertwined.  Applications
break for many reasons, but the typical reason is that one component is now binding
to a new version of another component.  We
have a whole team of architects, gathered from around the company, who have been meeting
regularly for about a year to grapple with the problems of a complete managed versioning
story.  Unlike managed AppCompat, the
intellectual investment in managed versioning has been enormous.


 

Anyway, Application Compatibility remains a relatively
contentious subject over here.  There
s no question that it s a hugely important topic which will have a big impact on
the longevity of our platform.  But we
are still trying to develop techniques for achieving compatibility that will be more
successful than what Windows has done in the past, without limiting our ability to
innovate on what is still a very young execution engine and set of frameworks.  I
have deliberately avoided talking about what some of those techniques might be, in
part because our story remains incomplete.


 

Also, we won t realize how badly AppCompat will bite
us until we can see a lot of deployed applications that are breaking as we upgrade
the platform.  At that point, it s easier
to justify throwing more resources at the problem.  But
by then the genie is out of the bottle& the deployed applications will already
depend on brittle accidents of implementation, so recovery will be painfully breaking.  In
a world where we are always under intense resource and schedule pressure, the needs
of AppCompat must be balanced against performance, security, developer productivity,
reliability, innovation and all the other must haves .


 

You know, I really do want to talk about Hosting.  It
is a truly fascinating subject.  I m
much more comfortable talking about non-preemptive fiber scheduling than I am talking
about uninteresting topics like implicit contracts and compatibility trends.


 

But Hosting is going to have to wait at least a few more
weeks.

Comments (31)

  1. Sriram says:

    Why does your RSS feed have only a truncated description for this post? Please let your entire posts stay in the RSS feed. I use a dial-up connection and read my aggregator’s new downloads offline….so this is cumbersome to say the least

  2. This is a fantastic post, and you are exactly right about App Compat being an ignored topic. I was one of the very few people at the app compat talk (and the only one in the front row with a badge that didn’t read Microsoft on it) as you were, and those that were there seemed disappointed in the lack of "sexiness" of this session. (The questions were along the lines of, "I was expecting to hear what you are doing for app compat in Longhorn…" "Well, we’re doing more of the same. Unlike most everything else here, you can use our bits today.") Anyway, I enjoyed your description of shimming, and I did not realize that it was done this way. Does anybody on the app compat team have a blog (akin to Raymond Chen’s brilliantly fun stories of the hows and whys of his knowledge domain)? Not only would it be fun, but if we’re not careful we just might learn someting. (Hey hey hey…) I remember the fun with Whistler, trying to get it to run Needs for Speed Porsche Unleashed, filing a bug against a DLL whose sole purpose in life was to grab as much memory as possible from the Win95 heap, which would eventually fail and the process would keep going. Whistler would never fail, and the machine would just grind to a halt, and the only solution was to drop in a Win95 heap emulator.

  3. Chris Brumme says:

    Sorry Sriram. I’ve corrected my mistake.

  4. Junfeng says:

    Chris, Raymond Chen worked in app compat team for many years. He is your man;)

  5. Eric Wilson says:

    Once again Chris, you do not disappoint us. The only thing that annoys me is that we do not get MORE posts:) I swear that I have read your post on Application Shutdown at least 4 times start to finish. I’m facinated by all the topics you have up here. Keep the great posts coming!

  6. Ian Ringrose says:

    Start quote –>
    There s got to be a reason why developers don t use such a valuable tool. In my opinion, the reason is that AppVerifier is not integrated into Visual Studio. If the Debug Properties in VS allowed you to enable AppVerifier and CDP checks, we would have much better uptake. And if an integrated project system and test system could monitor code coverage numbers, and suggest particular test runs with particular probes enabled, we would be approaching nirvana.
    <– End quote.

    a) First off, every place I have worked at, I get asked by managers’ match more often about shipping date then quality of the software.

    b) I have NEVER come across a customer running AppVerifier as part of the evaluation process before buying an application. Does EVERYONE at Microsoft do so before buying in software?

    c) The fact that a application that a ISV ships 2 years ago does not work on the next version of Windows is a good thing for the ISV:-) It makes it easer to sell support contracts, and to get customers to pay for upgrades. E.g. Exchange 2000 not working on NT 2003.

    So what can be done?
    a
    ) Make it very hard, not to run AppVerifier and other such tools when debugging an application in msdev. How about a pop up the points up that the developer, not his boss may land up being sued if he turns of AppVerifier?

    b) When a user installs a new application, Windows should ask if the user wish to test it with AppVerifier for the first week of usage, there should be NO way that the application vender (OR IT department) can turn this off!

    c) All systems that ship with version 1.1 (and 1.2) of the frameworks should also contains version 1.0 (and 1.1). By default an application should ONLY bind with the version it was built with. I do not see way each web page could not use a different version of the Framework, likewise with each SQL server database.

    Email: ringi at bigfoot dot com

  7. Jim Argeropoulos says:

    Chris I admit having not used AppVerifier in my unmanaged days. I did make heavy use of BoundsChecker.

    My question is this, What tools should I be using today for my managed applications. fxCop certainly falls on this list , but what else have I missed?

    Thanks

  8. Any plans for tightening the security model by switching to execution history-based access control instead of crawling the stack?

  9. Alek says:

    Dont you think that MS VS competing with any other tool vendor is unfair competition? I am disappointed to see this presented as a good thing …

  10. Keith Hill says:

    Comment to Alek: don’t you think MS giving away VS would be even more unfair? Or do you think MS shouldn’t provide tools for their platform? I do agree that folks like Borland are at a disadvantage for a number of reasons but the most I think MS can do about that is provide open/free access to the relevant APIs. WRT app compatibilty I hadn’t realized the issue with ASP.NET. I was always in favor of side-by-side is a great solution for binary compatibility. However, when it comes to source compatibility, I don’t mind breaking changes if they improve the platform and there is some way to do what I was doing before by tweaking the source code. I just really hate to see a young platform already start to take on the Foobar2 and FoobarEx type names. 🙁

  11. Chris Brumme says:

    Ian,

    I don’t think we would want to use legal or other coercion to force either developers or users to run with AppVerifier or some eventual managed equivalent. And although it’s reasonable for users to test potential products with AppVerifier before making a purchase, these tools are really intended for developers.

    I think our best approach is to make the tools so easy to use, and so valuable to use, that developers naturally do so. Compilers give you a lot of useful information about what might go wrong when you execute your application. Imagine if, on the first F5 in Visual Studio, you got a report of all the things that are a little flakey and questionable, with suggestions of exactly where and how to fix them.

    If the verification is completely integrated and provides compelling value, no coercion will be necessary.

  12. Chris Brumme says:

    Jim,

    If you are using FxCop today, that is certainly the single best thing you can do. You might also have a look at the ClrSpy utility that Adam Nathan built and describes in his blog. This is a convenient way to use Customer Debug Probes. See http://blogs.gotdotnet.com/anathan/CategoryView.aspx/Debugging.

    At some point, I think Microsoft needs to provide more FxCop rules, more Customer Debug Probes, new tools for detecting versioning changes, more diagnostic traces out of the managed class libraries, etc. But there’s also an opportunity for 3rd parties to add value here, too.

  13. Chris Brumme says:

    Dejan,

    You ask about security using execution-based history rather than the heuristic of stack crawls. We occasionally run into circumstances where stack crawls don’t work well. I’ve talked about some of these cases in prior blogs, like async points and delegates. I think you’ll see us continue to address these issues in the context of stack crawls (i.e. by considering additional logical pieces of stack, as we have done since V1).

    For other cases, like privacy of data, the execution of code is a poor indicator of how information has been leaked across trust boundaries. We’ve had some conversations about how a different approach might be necessary for managed enforcement of privacy. But none of those ideas has led anywhere practical yet. I think if you scan the public research papers on this topic, you’ll come to that same conclusion.

    There are still plenty of interesting problems waiting to be solved.

  14. Steve Bolton says:

    Another excellent and thoughtful article. Blogs like yours have improved my opinion of MS enormously. I was inspired to download AppVerifier immediately and run it on Excel and Word 2003! (No this is not a cheap attack on MS here, rather I was interested to see what would turn up and they are big applications that I believe contain a lot of legacy code).
    I believe these tools should be included in Visual Studio with a lot more promotion about their use as you describe. Thanks.

  15. Mark Morrell says:

    For Chris, Alek, and Keith –

    Economically speaking, there would normally be an advantage (to Microsoft) of giving away Visual Studio and assocated tools to make developed software better. VS is a complementary product to windows, and use of VS would theoretically increase demand for Windows because it increases the quality of software developed for Windows. This makes economic sense because it results in higher profits by selling more copies of Windows.

    However, because Microsoft is considered a monopoly it doesn’t make sense to give VS away because you probably won’t increase demand by an appreciable amount anyway. Also, it would be considered unfair use of monopoly power because it is anti-competitive so the government would probably shut the effort down.

    Anyway, your best bet would be to bundle the error checking and compatibility technology you are talking about into VS without increasing the price of VS. These are excellent features that would help justify a new upgrade that the public would pay for, increasing revenue without running afoul of the DoJ.

    It’s really funny to me that being declared a monopoly, it would be considerd unethical for MS to give stuff away, but it’s fine to sell it. Like an inverse charity.

  16. Albert says:

    I think the keynote from MSR on Wednesday was the best keynote on the PDC. I watched the reactions from the crowd and this keynote received the most applauses. I’m sure that it could’ve been better but still it wasn’t by any means substandard.

  17. eAndy says:

    I love the information posted in your blog. It always seems too long between posts. (can’t get enough)

    None the less, I agree that you can’t give away VS for a number of reasons
    – current anti-competitive ‘stuff’
    – loss of long term value proposition (hard to keep investing $ in a product you give away free, you start to see less innovation)

    As to the tools that make for better programs such as
    – CDP
    – AppVerifier
    – (unnamed tools from MSR)

    these should continue to be free. You CANNOT underestimate the cost to MS image as a platform if you allow bad code to be written and deployed.

    You must make these tools accessible and free.

    As for VS.NET keep charging but increase the value of VS by integrating those free tools (CDP, AppVerififer, unnamed MSR tools) into the VS product. That’s the value add we’re paying for.

    Borland and others can continue to compete and leverage the tools as they see fit.

    You get competition and good code (which helps the platform )

    If you start charging for the tools that make better code, people don’t use them and you start to hear stupid stuff like "MS doesn’t scale" "MS isn’t reliable, it always crashes", "MS isn’t secure". Things that are attributed to bad coding practices rather than the platform

  18. ?????Blog ?????????

    ?????? ??3?????? Apartments and Pumping in the CLR ???

    ?????????????????????(^_^;) ????????????????????

  19. See Win App says:

    This post is actually a re-post of a post I did a little under year ago during PDC ’05 after attending…