MeanDiff

Binary lifting, which is to translate a binary executable to a high-level intermediate representation, is a primary step in binary analysis. Despite its importance, there are only few existing approaches to testing the correctness of binary lifters.

The MeanDiff project is the first systematic approach to testing binary lifters.

The paper, presented at ASE 2017, can be found here, while the source is open source.

The System

MeanDiff is a tool implementing N-version IR testing technique. Currently, it supports 3 lifters for testing their semantics. If you give an instruction set architecture (ISA) to MeanDiff, it can generate a finite subset of all instructions which covers as large a portion possible of the ISA's syntactic structure. Whenever it generates a set of instructions, it uses them to lift such instructions to the target IRs, whose semantics are implemented by each lifter. Then, MeanDiff translates the translated IRs into UIR (Unified IR) to compare their semantics. Once the IR is translated, MeanDiff calculates the symbolic summary containing the meaning of the instruction, which are then compared. Any semantic discrepancy represents a bug in at least one of the lifters.

Bugs Found

We have evaluated the proposed system on 3 state-of-the-art binary lifters, and found 24 previously unknown semantic bugs. Our result demonstrates that writing a precise binary lifter is extremely difficult even for those heavily tested projects.

Below is the list of bugs found by MeanDiff for each of the tested binary lifters. Each bug is linked to an issue, found on the main Github repository, where the bug is described in detail and discussed with the respective tool's developers.

PyVEX

Version 6.7.4.12

Tag	Description
PyVEX1	Missing arithmetic operation
PyVEX2	Useless memory
PyVEX3	Invalid stack operation
PyVEX4	Not storing segment register [x64]

BAP

Version 1.2.0

Tag	Description
BAP1	Confusion of source and destination operands
BAP2	Invalid OF calculation
BAP3	Destination address changed
BAP4	Not taking mod size
BAP5	Missing signed extension
BAP6	Invalid CF calculation
BAP7	No memory alignment

BINSEC

Version 20170301 0.1

Tag	Description
BINSEC1	Invalid CF calculation
BINSEC2	Invalid OF calculation
BINSEC3	Invalid stack operation
BINSEC4	Invalid padding
BINSEC5	Missing signed extension
BINSEC6	Destination address changed
BINSEC7	No memory alignment
BINSEC8	Invalid store/load AF behaviour
BINSEC9	Invalid order of calculation
BINSEC10	Not taking part of value
BINSEC11	Invalid AF calculation
BINSEC12	Confusion of source and destination operands
BINSEC13	Unable to recognize prefix
BINSEC14	Invalid pushad behaviour

About Us

The MeanDiff research project has been done by SoftSec Lab, under Gratuate School of Information Security, KAIST, Republic of Korea.

The work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT).

The following researchers are involved in the project:

Soomin Kim <soomink (at) kaist.ac.kr>
Graduate student, KAIST
Markus Faerevaag <mfaerevaag (at) kaist.ac.kr>
Graduate student, KAIST
Minkyu Jung <hestati (at) kaist.ac.kr>
Undergraduate student, KAIST
SeungIl Jung <sijung (at) kaist.ac.kr>
Developer, SoftSec Lab
DongYeop Oh <oh51dy (at) kaist.ac.kr>
Developer, SoftSec Lab
JongHyup Lee <jonghyup (at) gachon.ac.kr>
Assistant professor, Gachon University
Sang Kil Cha <sangkilc (at) kaist.ac.kr>
Assistant professor, KAIST