Testing Intermediate Representations for Binary Analysis
Binary lifting, which is to translate a binary executable to a high-level intermediate representation, is a primary step in binary analysis. Despite its importance, there are only few existing approaches to testing the correctness of binary lifters.
The MeanDiff project is the first systematic approach to testing binary lifters.
The paper, presented at ASE 2017, can be found here, while the source is open source.
MeanDiff is a tool implementing N-version IR testing technique. Currently, it supports 3 lifters for testing their semantics. If you give an instruction set architecture (ISA) to MeanDiff, it can generate a finite subset of all instructions which covers as large a portion possible of the ISA's syntactic structure. Whenever it generates a set of instructions, it uses them to lift such instructions to the target IRs, whose semantics are implemented by each lifter. Then, MeanDiff translates the translated IRs into UIR (Unified IR) to compare their semantics. Once the IR is translated, MeanDiff calculates the symbolic summary containing the meaning of the instruction, which are then compared. Any semantic discrepancy represents a bug in at least one of the lifters.
We have evaluated the proposed system on 3 state-of-the-art binary lifters, and found 24 previously unknown semantic bugs. Our result demonstrates that writing a precise binary lifter is extremely difficult even for those heavily tested projects.
Below is the list of bugs found by MeanDiff for each of the tested binary lifters. Each bug is linked to an issue, found on the main Github repository, where the bug is described in detail and discussed with the respective tool's developers.
Version 6.7.4.12
Tag | Description |
---|---|
PyVEX1 | Missing arithmetic operation |
PyVEX2 | Useless memory |
PyVEX3 | Invalid stack operation |
PyVEX4 | Not storing segment register [x64] |
Version 1.2.0
Tag | Description |
---|---|
BAP1 | Confusion of source and destination operands |
BAP2 | Invalid OF calculation |
BAP3 | Destination address changed |
BAP4 | Not taking mod size |
BAP5 | Missing signed extension |
BAP6 | Invalid CF calculation |
BAP7 | No memory alignment |
Version 20170301 0.1
Tag | Description |
---|---|
BINSEC1 | Invalid CF calculation |
BINSEC2 | Invalid OF calculation |
BINSEC3 | Invalid stack operation |
BINSEC4 | Invalid padding |
BINSEC5 | Missing signed extension |
BINSEC6 | Destination address changed |
BINSEC7 | No memory alignment |
BINSEC8 | Invalid store/load AF behaviour |
BINSEC9 | Invalid order of calculation |
BINSEC10 | Not taking part of value |
BINSEC11 | Invalid AF calculation |
BINSEC12 | Confusion of source and destination operands |
BINSEC13 | Unable to recognize prefix |
BINSEC14 | Invalid pushad behaviour |
The MeanDiff research project has been done by SoftSec Lab, under Gratuate School of Information Security, KAIST, Republic of Korea.
The work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT).
The following researchers are involved in the project: