I presented this paper at the CMG 2006 conference. I've previously posted the content of my slides and some interesting results from using this sort of analysis but I thought you might like to see the paper in full. Comments are of course welcome.
Rico Mariani, Microsoft Corporation
This paper describes Performance Signatures, a simple qualitative approach to helping software developers improve their products’ performance. Performance Signatures allow development tools to give as-you-type prescriptive guidance and facilitate improved analysis and interpretation during traditional profiling sessions. Performance Signatures emphasize ease of adoption and prevention of common and/or large mistakes.
Experience teaches us that large scale software development projects increasingly have difficulty meeting their performance goals. Individuals in the software development industry come from diverse backgrounds, many not directly related to computer science, and there is no one technique or family of techniques that are widely taught, much less practiced, to achieve performance success.
In the best of all worlds we could universally teach Software Performance Engineering (SPE) [Smith and Williams 2002] with emphasis on certain core values of engineering:
· That engineering is a quantitative discipline
· That engineering is about achieving predictable results for predictable costs
· That engineers use techniques like planning, modeling, and prototyping to manage risks and achieve predictable results
The trouble with this approach is that many programmers are not in fact trained engineers and furthermore do not wish to be. What can we, as performance professionals, do to help these people to succeed?
I don’t know of any method, other than SPE, that can guarantee predictable performance results. It’s hard to imagine anything other than careful goal-setting and methodical planning that could yield such a result. However, in many cases the level of control that is possible with SPE is not necessary for all aspects of a project.
While it is true that for some projects missing performance goals by as little as 1% would represent a disaster, there are many projects for which arriving within 10% of the desired goal would be an occasion to celebrate. Indeed, in many situations, failure is more like a performance calamity observed at the end of the development cycle. The “final” product is totally unacceptable because it is operating perhaps five, ten, or even twenty times slower than is desired. Any reasonable process that would help prevent these kinds of disasters would be valuable, even if less than perfect.
This paper describes one such approach, still in its early stages, with potential to improve the state of the art in these matters.
Typical Questions from Concerned Programmers
Discussions with programmers working on systems where performance “matters” are, not surprisingly, rich with queries about specific practices.
“Can I use this method in my program?” “If I use this class will it be fast?” “Does this technique give good performance?” You can fill in the blanks. Sometimes the discussions are about database access, sometimes about XML processing, sometimes about thread pools, string management, collection classes, enumeration, sorting, searching, you name it.
The people asking these questions often have, or are in the process of building, a software solution that involves a complex framework, such as Microsoft® .NET™. They are often overwhelmed with the available implementation choices and have little in the way of supporting information on which to make decisions. They are seeking any kind of guidance and the “blessing” of an expert is often a very desirable affirmation.
The irony of this situation is that the expert is not likely to be able to help very much; lacking context, the performance expert can only provide the odd useful tidbit of advice. Sharpening the irony is the fact that the people asking these sorts of questions are, of course, the ones that care the most about performance. They are the ones we most want to help.
What I invariably end up doing in these situations is asking some questions about the context and then trying to give some basic relevant guidelines, such as which services are reasonable to use, what is intended to be used in that context and what is not. It isn’t quantitative advice – usually there are no numbers available – but it can nonetheless be helpful in avoiding some of the biggest problems.
Context Specific Guidance
A colleague and I used to joke that all programmers should be forced to name their methods so that they end in a number – the floor of the base ten log of the number of processor cycles the method is expected to consume. Then, when writing a method that ends in a “5” you are simply not allowed to call any method that ends in “6.” Our thinking was that even this simple rule would prevent many performance calamities. It is perhaps the most basic example of somehow using the performance expectations of the caller to restrict the choice of callable methods to a subset consistent with that expectation.
However, even this simple approach is perilous. For starters, not every programmer is comfortable working with logarithms, much less predicting the likely base ten log of processor cycles that a method will execute when called. Moreover, the cost in processor cycles for any given method is often not a fixed quantity – a spectrum is much more probable – in which case, the advice we must give is much more complicated, not to mention much more likely to change as the caller and callee evolve.
Notwithstanding this problem, I still find myself able to give reasonable advice under a wide variety of circumstances when put on the spot. This brings us at last to the main thrust of this paper, a qualitative approach to providing useful guidance to the developer about the performance consequences of calling a method.
To facilitate recommendations we begin by computing a qualitative Performance Signature for every method in a system that reflects its relative cost along some relevant performance dimension, such as a resource usage metric. Many potential formulations of signatures are possible, but to be useful, the computation must result in a formal partitioning with these desirable properties:
- The number of signatures (partitions) is small enough to remember (e.g., less than seven unique signatures).
- The signature of any given method can be statically computed by analyzing the portions of that method which are typically executed (e.g., disregarding basic blocks which “throw”, and disregarding “catch” blocks) and any child methods (i.e., the transitive closure of the call graph).
- The signature implies reasonable prescriptive rules and/or limitations that a programmer could follow to create a method of a desired signature.
- The signatures are sufficiently broad that a method is unlikely to ever move from one signature to another.
Ultimately, the utility of signatures is based on the notion that a method M cannot meet its cost requirement k if any method with cost requirement typically greater than k appears in the call graph of M, regardless of what cost dimension k measures.
Importantly, this is true even if k is a discrete cost metric with a small number of easily-remembered values as we’ll see below.
Approximate Characterization Still Effective
Since the signatures are designed to represent fairly broad categories of cost, the rules to compute them do not have to be complicated formulations. For instance, one important cost dimension is whether or not a given method allocates memory, and, if so, is the method limited to a one-time allocation of only its return type or does it necessarily allocate and free temporary memory repeatedly. A fairly simple analysis (with transitive closure) could partition methods effectively into three large buckets:
- Those that make no allocations or that only allocate their return type (“low” cost)
- Those that have less than 5 distinct memory allocation requests in their call graph (“medium” cost)
- Everything else (“high” cost)
It’s worth pointing out some facts about the partitioning proposed above:
- “5” distinct allocation sources was chosen arbitrarily
- Classification along this dimension ignores the computational complexity of the method entirely
- Stack-based memory usage is entirely disregarded
- Cycles in the call graph are ignored (i.e. when a call that would cause a cycle in the call graph is encountered, it is ignored)
And yet, even this simple-minded partitioning scheme can provide valuable guidance! Consider this real life example: a programmer writing a hashing function wishes to know if it would be wise to call “String.Split” in that context. Can we answer that question with the lame partitioning proposed above? Yes!
A hashing function must be very inexpensive, and therefore, should generally be in the lowest cost bucket. But the helper method String.Split needs to allocate an array of string pieces, it is therefore in the “medium” cost bucket. Thus, one should never call String.Split from inside a hashing function; a common programming error and a resulting performance problem has been prevented.
The resulting hashing function might still be unsuitable for other reasons, but at least now it has a better chance of delivering appropriate performance.
Experience teaches us that many costly performance mistakes are not small ones that require subtle filters to locate. Rather, programmers commonly make large mistakes which even simple qualitative filters can catch.
Analysis by Cross-Checking
In order to perform any meaningful analysis, we must be in a situation where we can cross-check two different appraisals for consistency – two different sources of the Performance Signature. Of course one source will be a computed signature such as the one described above. The second source is the desired or required signature:
- Desired, if a particular method has been annotated by the user.
- Required, if the method is an implementation of a well known type or pattern that is expected to have a certain signature.
Common polymorphic methods like “GetHashCode” come with an implied promise of performance which can be translated into a constraint on the signature, facilitating analysis. Many required signatures emerge naturally in the context of particular frameworks.
Helpfully, these implied promises tend to occur in important areas because they are often central members of a framework. For instance, in the .NET framework, all implementations of primitive interfaces like IComparable, IEquatable, IEnumerable, as well as overrides of GetHashCode should be “low” cost. Other readily-identifiable methods such as UI event handlers, or portions of HTML page renderers should be “medium”.
The upshot is that we can use a simple set of name pattern matching rules to infer many required signatures and facilitate a cross-check even with little or no explicit annotation by a developer. Naturally, the developer can further improve the quality of the analysis by annotating his/her code with specific desired signatures.
The presence of polymorphic call sites is often troublesome for algorithms that are attempting to perform static code analysis. However Performance Signatures offer some solace in the analysis phase:
- The analysis is only intended to be approximately correct, so a worst-, average-, or even best-case approach can be used when computing transitive closures: whatever yields a reasonable partitioning.
- To avoid creating potential confusion for users it is desirable that polymorphic versions of the same method have similar performance characteristics, and therefore the same Performance Signature. Where the Performance Signature varies between implementations that fact is worthy of reporting in its own right.
- In many cases the above statement is so strong that it actually constitutes a requirement.
- When visiting any potential child node (callee) whose signature is constrained, that constrained cost can be used in the parent (caller) rather than computing the cost of the child. This avoids the problem of having one poorly implemented polymorphic implementation causing a cascade of warnings further up the call tree.
Overall, these factors seem to indicate that the presence of polymorphism will not make the computation of signatures intractable.
Of course any algorithm requiring static analysis of call graphs faces the reality that call graphs have cycles. However, fairly straightforward techniques can still be used to compute a cost function even in the presence of cycles.
Preliminarily, many cycles are caused by polymorphic paths where, for instance, formatting methods call successively lower-level formatting methods all of which conform to the same type signature. In those cases the rules for managing polymorphism and using constrained costs (described above) are invaluable for computing signature of the cycle.
But, more generally, upon discovering a cycle, the cost computation can end and suitable pointers can be left in the graph nodes so that the final cost of all the nodes in the cycle becomes the greatest computed cost of any of the nodes in the cycle – which will perforce be the cost of the root of the spanning tree. Signature assignment can then be done on the basis of that cost, with all the nodes in the cycle gaining the same signature.
Enriching Signatures With Other Data
The creation of good partitioning strategies is the primary area for future investigation in this space. For instance, the signature computation described earlier used only memory consumption for partitioning. This approach yields useful results because memory usage is often a good indicator of algorithmic complexity, but it is not the only important cost function to evaluate.
To achieve more differentiation in the types of signatures that can be created, other metrics may be mixed into the partitioning strategy, for instance:
- Cyclomatic Complexity [McCabe 94], or Maintainability Index [VanDoren 2006] can be used to help forecast algorithmic cost – an approximately correct cost is still useful for partitioning.
- Live execution time in a set of benchmarks could be blended into the cost
- Data or code volume (cache lines, pages, etc.) as measured in a set of benchmarks could be considered in the cost.
- Manual “override” of poorly computed signatures may be needed – methods with highly expensive, but rarely executed code blocks, for instance might get the wrong signature.
In all cases the idea is to introduce readily computable results, plus overrides, to get reasonable signatures. Managed code environments are particularly friendly to these operations because both Java bytecodes and .NET IL instructions can be analyzed to compute the static costs described above. Source code is not required.
Why Does This Work?
It is perhaps surprising that a small number of signatures, computed very simply, can be used to give effective guidance; however, that conclusion does resonate with experience.
Many sorts of applications, both client and server, have the bulk of their functionality placed in middle layers of logic. In typical managed applications on the .NET platform for instance, there are event handlers dedicated to responding to various user stimuli in the client or for creating various HTML page fragments on the server. These middle level pieces of code have abundant freedom to do temporary allocations and modest computation without causing any kind of performance disaster. The disasters tend to occur when middle layers reach beyond their scope and try to invoke more intensive processing services in their medium level context. Until and unless that happens, performance tends to degrade in a fairly natural way that corresponds roughly to intuition – twice as much processing takes twice as long.
In similar fashion, the very lowest level computations – comparison functions, hashing functions, and the like – must stay within their own family or an observable disaster is likely to occur.
This basic analysis is certainly not any kind of complete solution to every performance problem, but in terms of disaster prevention it does seem as though a small number of performance signatures could be helpful. And the solution is all the more embraceable for its simplicity.
Deliverables For an Ecosystem
In order to create an ecosystem that facilitates adoption of Performance Signatures, library vendors should provide some additional supporting files with their libraries:
- A list of methods in the libraries, each annotated with its pre-computed cost and resultant signature.
- A set of name-pattern-rules from which signatures can be implied (e.g. *.GetHashCode è “low”).
The pre-computed costs are necessary, so that developers may use tools to compute the costs of their own methods without having to perform lengthy analysis on large framework libraries. Development tools may terminate cost and signature computations upon reaching any framework entry point with a known cost/signature.
The rules for implication are perhaps of even greater importance – they characterize some simple patterns that appear within the library. The knowledge that, for instance, methods named “*.GetHashCode” should be of “low” cost cannot be inferred by analysis in any easy way. A good set of default rules will save users from either building their own set of patterns or adding many repetitive per-method annotations to get good results.
As discussed previously, many forms in which libraries are delivered (e.g. Microsoft .NET assemblies) are amenable to analysis. A tool wishing to apply Performance Signature-based techniques could compute the needed signatures on the developer’s computer even if the library vendor either does not provide data for their libraries, or uses a different signature formulation.
Finally, Performance Signatures are programmer-friendly so they can appear in documentation as well as supporting various tools-based uses detailed below.
Perhaps the most compelling application of Performance Signatures is during actual coding. Systems like Microsoft® Visual Studio.NET™ offer “Intellisense” where, as a programmer enters language statements, possible completions, notably lexically valid method calls, are offered in a drop down list. Given an existing or desired Performance Signature for the method that is being authored, it is a small matter to color-code the suggested method calls such that those with appropriate Signatures appear in, for instance, green and more dubious choices contrariwise appear in red: with Performance Signatures we could deliver as-you-type guidance for the cost of only a few bits per method.
Such a feature on its own would likely prevent some of the most egregious performance problems very early in the programming cycle, with no need to run a profiler or understand its results.
A user desiring more information could inquire why a particular method call was deemed inappropriate and receive advice based on their context, the rules for their context, the Signature of the target, and the factors which affected the target’s Signature.
The approach described above could also be used in a batch context. A user’s entire application could be scrubbed to ensure that the computed signature of every method cross-checks with the desired or required signature. As described above the desired/required signatures can be implicit (e.g. all implementations of *.GetHashCode should be “low” cost) or explicit (e.g. user places a suitable annotation in an external XML file, an inline comment, or custom attribute).
When combined with the ability to ignore warnings that are deemed spurious, a batch validation of signatures could be an invaluable tool to help keep performance issues in check during development or maintenance cycles.
In the context of troubleshooting, Performance Signatures could enhance the value of measurements. For instance, observed memory patterns that do not agree with a desired or required signature can be highlighted if they have a cost that significantly exceeds their target. These costs can be described in terms of Performance Signatures to help isolate problems. E.g., “Method M1 was to be of ‘medium’ cost; however, its inclusive cost is ‘high’, this seems to be because method M2 is of ‘high’ cost,” and, drilling down deeper to the next level of detail, “Method ‘M2’ is of high cost because it allocated 100 System.String objects.”
The Performance Signatures would help to identify trouble-spots and to provide simple, actionable guidance in understandable terms.
Work on Performance Signatures is still in the early stages. Currently there is a working system that does basic analysis and transitive closure over Microsoft .NET assemblies, and at this time we are experimenting with partitioning choices and expect to be able to create meaningful reports – cross checking computed vs. constrained-by-interface signatures -- very soon.
While the idea of providing this type of information has so far been received with universal enthusiasm, the coming experiments will tell us if the reality lives up to the expectation.
It is always desirable to prevent large performance problems as soon as possible. A tool that helps us to do this, even in only the “easy” cases, at the very least provides us with more time to invest in more difficult cases requiring more complex planning.
Programmers have few resources to turn to when it comes to characterizations of 3rd party libraries that they intend to use or practices they should follow; Performance Signatures can provide both valuable data and vocabulary in these areas. A signature formulation that is readily computed and is easily digested by users can be a useful tool both to teach good usage and to describe the services offered.
The ability to provide assistance to programmers in real time, in a natural and familiar context is invaluable – if only to build awareness.
Even simple signature computations show promise of having real utility; further experimentation is likely to lead to a more robust system of partitioning which can serve developers in many contexts.
I would like to thank my colleagues Hazim Shafi, Mark Friedman, and Gerardo Bermudez for their invaluable assistance in preparing this paper.
[Smith and Williams 2002] C. U. Smith and L. G. Williams, Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software, Boston, MA, Addison-Wesley, 2002.
[McCabe 94] McCabe, Thomas J. & Watson, Arthur H. "Software Complexity." Crosstalk, Journal of Defense Software Engineering 7, 12 (December 1994): 5-9.
[VanDoren 2006] Edmond VanDoren, “Maintainability Index Technique for Measuring Program Maintainability”, http://www.sei.cmu.edu/str/descriptions/mitmpm.html, Carnegie Mellon Software Engineering Institute.