Table of Contents
Fetching ...

REMaQE: Reverse Engineering Math Equations from Executables

Meet Udeshi, Prashanth Krishnamurthy, Hammond Pearce, Ramesh Karri, Farshad Khorrami

TL;DR

The REMaQE automated framework for reverse engineering of math equations from binary executables handles equation parameters accessed via registers, the stack, global memory, or pointers, and can reverse engineer equations from object-oriented implementations such as C++ classes.

Abstract

Cybersecurity attacks on embedded devices for industrial control systems and cyber-physical systems may cause catastrophic physical damage as well as economic loss. This could be achieved by infecting device binaries with malware that modifies the physical characteristics of the system operation. Mitigating such attacks benefits from reverse engineering tools that recover sufficient semantic knowledge in terms of mathematical equations of the implemented algorithm. Conventional reverse engineering tools can decompile binaries to low-level code, but offer little semantic insight. This paper proposes the REMaQE automated framework for reverse engineering of math equations from binary executables. Improving over state-of-the-art, REMaQE handles equation parameters accessed via registers, the stack, global memory, or pointers, and can reverse engineer object-oriented implementations such as C++ classes. Using REMaQE, we discovered a bug in the Linux kernel thermal monitoring tool "tmon". To evaluate REMaQE, we generate a dataset of 25,096 binaries with math equations implemented in C and Simulink. REMaQE successfully recovers a semantically matching equation for all 25,096 binaries. REMaQE executes in 0.48 seconds on average and in up to 2 seconds for complex equations. Real-time execution enables integration in an interactive math-oriented reverse engineering workflow.

REMaQE: Reverse Engineering Math Equations from Executables

TL;DR

The REMaQE automated framework for reverse engineering of math equations from binary executables handles equation parameters accessed via registers, the stack, global memory, or pointers, and can reverse engineer equations from object-oriented implementations such as C++ classes.

Abstract

Cybersecurity attacks on embedded devices for industrial control systems and cyber-physical systems may cause catastrophic physical damage as well as economic loss. This could be achieved by infecting device binaries with malware that modifies the physical characteristics of the system operation. Mitigating such attacks benefits from reverse engineering tools that recover sufficient semantic knowledge in terms of mathematical equations of the implemented algorithm. Conventional reverse engineering tools can decompile binaries to low-level code, but offer little semantic insight. This paper proposes the REMaQE automated framework for reverse engineering of math equations from binary executables. Improving over state-of-the-art, REMaQE handles equation parameters accessed via registers, the stack, global memory, or pointers, and can reverse engineer object-oriented implementations such as C++ classes. Using REMaQE, we discovered a bug in the Linux kernel thermal monitoring tool "tmon". To evaluate REMaQE, we generate a dataset of 25,096 binaries with math equations implemented in C and Simulink. REMaQE successfully recovers a semantically matching equation for all 25,096 binaries. REMaQE executes in 0.48 seconds on average and in up to 2 seconds for complex equations. Real-time execution enables integration in an interactive math-oriented reverse engineering workflow.
Paper Structure (22 sections, 18 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 22 sections, 18 equations, 8 figures, 7 tables, 2 algorithms.

Figures (8)

  • Figure 1: Reverse engineering the Linux "tmon" thermal controller with different tools: (a) C source code, (b) decompilation with Ghidra, (c) symbolic execution with Angr, (d) math equations recovered with REMaQE.
  • Figure 2: The REMaQE framework. Average execution time of the pipeline is 0.48 seconds, from when the user provides which function to reverse engineer, to when REMaQE returns the math equations.
  • Figure 3: REMaQE reverse engineering pipeline. The controller_handler function from Fig. \ref{['fig:tmon_example']} is used as an example, and intermediate outputs are shown for each stage through the pipeline: (a) Parameter analysis automatically identifies the input, output and constant parameters of the function along with their kind and storage location, (b) Symbolic execution runs the function with properly initialized symbolic inputs and gathers the output symbolic ETs, and (c) Algebraic simplification converts the output ET to a human-friendly math equation.
  • Figure 4: Histogram of number of binaries vs. max error during evaluation match. The tolerance $\epsilon = 10^{-5}$ is marked.
  • Figure 5: Average ratio of equation complexity for each parameter binned with the number of nodes. The box indicates $\mu \pm \sigma$ ($\mu$ is mean, $\sigma$ is standard deviation), while the line and whiskers indicate range.
  • ...and 3 more figures