FORCED EXECUTION FOR SECURITY ANALYSIS OF SOFTWARE WITHOUT SOURCE CODE
Binary code analysis is widely used in many applications, including reverse engineering, software forensics and security. It is very critical in these applications, since the analysis of binary code does not require source code to be available. For example, in one of the security applications, given a potentially malicious executable file, binary analysis can help building human inspectable representations such as control flow graph and call graph.
Existing binary analysis can be roughly classified into two categories, that are static analysis, and dynamic analysis. Both types of analysis have their own strengths and limitations. Static binary analysis is based on the result of scanning the binary code without executing it. It usually has good code coverage, but the analysis results are sometimes not quite accurate due to the lack of dynamic execution information. Dynamic binary analysis, on the other hand, is based on executing the binary on a set of inputs. On the contrast, the results are usually accurate but heavily rely on the coverage of the test inputs, which sometimes do not exist.
In this thesis, we first present a novel systematic binary analysis framework called X-Force. Basically, X-Force can force the binary to execute without using any inputs or proper environment setup. As part of the design of our framework, we have proposed a number of techniques, that includes (1) path exploration module which can drive the program to execute different paths; (2) a crash-free execution model that could detect and recover from execution exceptions properly; (3) overcoming a large number of technical challenges in making the technique work on real world binaries.
Although X-Force is a highly effective method to penetrate malware self-protection and expose hidden behavior, it is very heavy-weight. The reason is that it requires tracing individual instructions, reasoning about pointer alias relations on-the-fly, and repairing invalid pointers by on-demand memory allocation. To further solve this problem, we develop a light-weight and practical forced execution technique. Without losing analysis precision, it avoids tracking individual instructions and on-demand allocation. Under our scheme, a forced execution is very similar to a native one. It features a novel memory pre-planning phase that pre-allocates a large memory buffer, and then initializes the buffer, and variables in the subject binary, with carefully crafted values in a random fashion before the real execution. The pre-planning is designed in such a way that dereferencing an invalid pointer has a very large chance to fall into the pre-allocated region and hence does not cause any exception, and semantically unrelated invalid pointer dereferences highly likely access disjoint (pre-allocated) memory regions, avoiding state corruptions with probabilistic guarantees.