Implementation Ideas for the CAT.NET 2.0 Tainted Variable Analysis Algorithm

Article
07/08/2009

Andreas Fuchsberger here…..

Within the Information Security Tools Group we are now really getting into a redesign of our popular Code Analysis Tool for .NET (CAT.NET). One of the biggest challenges we have is to redesign the engine so that it no longer suffers from an out of memory condition when analyzing large binaries. To this end we have been looking at new ways of implementing the current tainted variable analysis algorithm, which is based on directed graphs as well as extending the implementation to possibly cover static analysis algorithms.

As is the case in CAT.NET data sets in large applications are often too massive to fit completely inside the computer’s internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. This is why we have started by surveying algorithms that designed for using External Memory (EM) that are applicable to us. On such algorithm was published by Buchsbaum, Goldwasser et al. In 2000, in a paper called “On External Memory Graph Traversal”. The paper describes a new external memory data structure, the buffered repository tree, and uses it to provide a non-trivial external memory algorithm for directed breadth-first search and an improved external algorithm for directed depth-first search. The directed depth-first search are applicable to CAT.NET.

The algorithm is based on a new data structure called the buffered repository tree (BRT) , which improves on an auxiliary data structure used by Kumar and Schwabe in “Improved Algorithms and Data Structures for Solving Graph Problems in External Memory”, called a Tournament Tree. A Tournament Tree is similar to a heap, but holds additional information, and typically looks like a draw when matching opponents and the winner of that round in game played 2 players or parties. Kumar and Schwade extend the concept of Tournament Tree to include I/O relevant information, such as the cost of accessing external memory.

A Buffered Repository Tree (BRT) extends the concept of a Tournament Tree in the following way. A BRT store a multi-set of items from an ordered universe. Each item is given a key. The operation of the BRT is described by the authors as follows: “A BRT is implemented as (2, 4)-tree, each leaf holds B items. Each internal node has the standard search keys and a buffer holding B items. ... The basic idea is to simulate the internal directed depth first search algorithm: examine the top vertex on the stack, examine the next unexplored arc out of that vertex, and if the arc points to an undiscovered vertex, push the new vertex on the stack and iterate. The key difficulty is determining whether the next unexplored arc points to an undiscovered vertex, without doing I/Os per arc. To do this, when a vertex v is first discovered, incoming arcs are given a premark, putting them keyed by x in a BRT. A priority queue, initialized to contain all outgoing arcs, keyed by their rank in an adjacency list , i.e. the order that the DFS scan the adjacency list.” The paper then goes on to explain how to perform DFS and BFS showing that their runtime performance. The interested reader is referred to the original paper.

Initial discussions about implementing BRTs for the new engine of CAT.NET have been encouraging.

More on CAT.NET 2.0 as it evolves ……

Implementation Ideas for the CAT.NET 2.0 Tainted Variable Analysis Algorithm

Additional resources