Binary lifting, which is to translate a binary executable to a high-level intermediate representation, is a primary step in binary analysis. Despite its importance, there are only few existing approaches to testing the correctness of binary lifters.

The MeanDiff project is the first systematic approach to testing binary lifters.

The paper, presented at ASE 2017, can be found here, while the source is open source.

The System

The MeanDiff architecture

MeanDiff is a tool implementing N-version IR testing technique. Currently, it supports 3 lifters for testing their semantics. If you give an instruction set architecture (ISA) to MeanDiff, it can generate a finite subset of all instructions which covers as large a portion possible of the ISA's syntactic structure. Whenever it generates a set of instructions, it uses them to lift such instructions to the target IRs, whose semantics are implemented by each lifter. Then, MeanDiff translates the translated IRs into UIR (Unified IR) to compare their semantics. Once the IR is translated, MeanDiff calculates the symbolic summary containing the meaning of the instruction, which are then compared. Any semantic discrepancy represents a bug in at least one of the lifters.

Bugs Found

We have evaluated the proposed system on 3 state-of-the-art binary lifters, and found 24 previously unknown semantic bugs. Our result demonstrates that writing a precise binary lifter is extremely difficult even for those heavily tested projects.

Below is the list of bugs found by MeanDiff for each of the tested binary lifters. Each bug is linked to an issue, found on the main Github repository, where the bug is described in detail and discussed with the respective tool's developers.

PyVEX

Version 6.7.4.12

Tag Description
PyVEX1 Missing arithmetic operation
PyVEX2 Useless memory
PyVEX3 Invalid stack operation
PyVEX4 Not storing segment register [x64]

BAP

Version 1.2.0

Tag Description
BAP1 Confusion of source and destination operands
BAP2 Invalid OF calculation
BAP3 Destination address changed
BAP4 Not taking mod size
BAP5 Missing signed extension
BAP6 Invalid CF calculation
BAP7 No memory alignment

BINSEC

Version 20170301 0.1

Tag Description
BINSEC1 Invalid CF calculation
BINSEC2 Invalid OF calculation
BINSEC3 Invalid stack operation
BINSEC4 Invalid padding
BINSEC5 Missing signed extension
BINSEC6 Destination address changed
BINSEC7 No memory alignment
BINSEC8 Invalid store/load AF behaviour
BINSEC9 Invalid order of calculation
BINSEC10 Not taking part of value
BINSEC11 Invalid AF calculation
BINSEC12 Confusion of source and destination operands
BINSEC13 Unable to recognize prefix
BINSEC14 Invalid pushad behaviour

About Us

The MeanDiff research project has been done by SoftSec Lab, under Gratuate School of Information Security, KAIST, Republic of Korea.

The work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT).

The following researchers are involved in the project:

SoftSec Lab