One thing that continues to amaze me are the powerful tools available to developers and QA nowadays. Application performance can be improved through profiling and optimization tools operating statically and/or dynamically on the binary (using PGO for example). Testing metrics become more accurate when using instrumentation and code coverage tools. Compatibility and conformance issues can be detected with applications like AppVerifier (targeting Windows XP) and FxCop (targeting .NET Framework). And then there is the security space which is increasing in importance and where Microsoft has invested much effort and research. In this space we have various runtime validation techniques and verifiers (/GS, heap verifier), static analysis of object code (fxcop again), static analysis of source code (prefast, which is the codename for the /analyze compiler switch, and SAL annotations). While there is no tool that can do everything, and some overlap in their purpose, used in combination they ensure high quality product development.
In this post, I will focus more on static source code analysis using prefast. I believe the justification for using it is very strong. First, it can find very serious defects, that everyone is afraid of (crashes, blue screens, freezing, exploitable security holes) and which may be quite hard to detect with code inspection and debugging. Prefast is able to detect most cases of: memory management (leaks), pointer management (double free, freeing pointer to freed or non allocated memory such as stack or a global variables, freeing in the middle of the block, return pointer to local), initialization (using uninitialized variables, freeing/dereferencing uninitialized/null pointer, writing to constants), and boundary violations (buffer over/under runs). Because static analysis means source code scanning, the defects are signaled at compile time. The earlier in the product cycle the better and because we’re operating directly on source code the error messages are more explanatory and precisely located. There is no debugging needed to find the wrong line of code and the involved variables. After a short accommodation time, the error messages suggest very obvious solutions. Even if static analysis tools are not meant to replace code reviews and test plans, they are an excellent extra validation and may offer even more coverage than test cases.
Nobody is saying that automated tools can find all the defects, nor that they don’t generate noise. There is always a tradeoff to consider between diagnostic accuracy, completeness and speed. But overall, experience proves that these tools can find plenty of serious bugs, at the expense of some reasonable noise. Static code analysis tools are evaluating all possible paths from the beginning of the program to every exit point. Actually, depending on the speed vs. performance requirements, the number of paths processed at one time can be limited to a certain value that can be set. Also, when there is enough information in the code, some paths may be deduced as unreachable and skipped. Now it becomes obvious that noise comes from paths that are not skipped because there is not enough information in the code for the tools to analyze or understand. Also, the maximum value of paths number may cause defects not to be reported. However, these scenarios are quite rare and don’t overshadow the benefits of using static analysis tools.
Prefast is meant to be run, by both devs and QA, before checking into the depot as a quality requirement. It’s designed to execute quickly and this performance requirement comes with a price: less thorough code scanning and some limitations. Otherwise, the pain and slowness of running it would not be compensated by the benefits. It doesn’t scale well to large code bases, has no information about global state, there are constructs it doesn’t understand, and it is not able to look recursively into code. While analyzing each function, the only thing that prefast is able to do with code called from outside is check the signature annotations of all called functions, but without analyzing their implementation (however, analyzing their implementation is not skipped, every function ends up under the radar as a main unit). Also, it doesn’t detect defects in non-instantiated templates.
I can give you a few examples of the non-recursiveness limitation. If you are about to call a function having a parameter required to be not null, prefast still emits a warning if you are checking for the parameter nullness in a function. If you express the check explicitly in the current function body or with a macro, then prefast understands that the requirement is met. In other cases, you may get a dereferencing NULL pointer warning, even if a few lines above the NULL case is checked but there is a no-return function (like one that throws, or like an invalid parameter handler) instead of an explicit return. Fortunately, there is a workaround for this scenario: to annotate the no-return function with __declspec(noreturn), construct that is well understood by prefast.
Prefast works very well with macros because analysis happens after the preprocessing step of compilation. No check is lost, unless the wrong macros are chosen to perform variable checks. As an example, if a check is required and performed with an assert, the prefast warning is not emitted for debug but you’ll still see it in a retail build (unless the involving code is also hidden from retail). Hence, an ensure macro is more suitable, if the current function may throw (otherwise use a not throwing macro, but seen by retail builds). Also avoid using assume/restrict (or macros using these keywords) when trying to fix a prefast warning, because they generate compiler optimizations and the effect may be only a false prefast silence, allowing defects to remain undetected.
SAL annotations (Standard Annotation Language) can increase significantly the power of static analysis tools and the accuracy of their diagnostic. Requirements and restrictions of parameters, function signatures, execution flow, otherwise subtle to deduce from the source code itself, can be expressed in a formal way understood by prefast. The effect is not only to reduce the noise, but also to intensify the analysis according to the specified behavior. For example, to correct a possible dereference of a NULL pointer inside a function, you either check for nullness before and exit in case of failure or, if the logic of function is to never have that parameter NULL, a simple annotation in the function signature will silence the warning inside the current function, but analysis will start to investigate the same scenario in all callees, aspect that was not checked before the annotation was set. Prefast warnings may be caused not only by code defects, but also by wrong use of annotations. Annotations must always reflect the specification of the function, and not be set only to silence prefast warnings.
There are many categories of annotations: applicable to function parameters, buffers, code behavior and execution flow. They can describe very complex requirements, but I won’t focus on enumerating them in this post. See a comprehensive list here. Our CRT/ATL/MFC headers are full of them…check them out. After understanding what SAL keywords describe, reading annotated code is quite easy.
SAL annotations can be applied to native code only. They help catching defects mostly in C code, but C++ gains great benefits too, even if its object oriented nature offers higher levels of security and less holes. Even high level abstractions like STL have some annotations here and there.
Annotating large code bases may be painful for the developer, I admit, but the benefit is great. Think that with the effort of reviewing and annotating every function signature (definition plus declaration) all calls to the function benefit automatically. Legacy code is still an important factor nowadays. Not many software companies can afford to rewrite all their products in safer high level languages, although nothing is preventing them from doing so regarding new features/products (since interop and compatibility were always VC++ focus). Hence, choosing to annotate old C/C++ sources (in combination with running static analysis tools like prefast) to increase security and remove defects is the best approach to take.
I promise more in depth information in my future vcblog entry. I will compile a useful list with specific prefast warning scenarios (from those I have recently encountered while cleaning up our libraries with prefast help), developer actions in solving a defect or in silencing a particular warning, SAL coding guidelines, prefast limitations and workarounds, integrating annotated with non annotated code bases.
SDE Visual C++ Libraries