Table of Contents
Fetching ...

Simplifying MBA Expression Using E-Graphs

Seoksu Lee, Hyeongchang Jeon, Eun-Sun Cho

TL;DR

MBA obfuscation blends Boolean and arithmetic forms and is a major challenge in malware reverse engineering. The authors propose an e-graph based approach, implemented with the Rust egg library, to represent multiple semantically equivalent MBA expressions in e-classes and apply rewrite rules to obtain simpler forms. They formalize MBA expressions as linear $\\sum_{i \\in I} a_i e_i$ and polynomial $\\sum_{i \\in I} a_i (\\prod_{j \\in J} e_{i,j})$, and build a rule-based rewriting pipeline with preprocessing to distinguish subtraction from negation. Experiments show sub-second simplification times and competitive success rates across multiple datasets, with faster performance than prior tools like SSPAM and GAMBA, indicating the approach is scalable and practical for deobfuscation workflows.

Abstract

Code obfuscation involves the addition of meaningless code or the complication of existing code in order to make a program difficult to reverse engineer. In recent years, MBA (Mixed Boolean Arithmetic) obfuscation has been applied to virus and malware code to impede expert analysis. Among the various obfuscation techniques, Mixed Boolean Arithmetic (MBA) obfuscation is considered the most challenging to decipher using existing code deobfuscation techniques. In this paper, we have attempted to simplify the MBA expression. We use an e-graph data structure to efficiently hold multiple expressions of the same semantics to systematically rewrite terms and find simpler expressions. The preliminary experimental result shows that our e-graph based MBA deobfuscation approach works faster with reasonable performance than other approaches do.

Simplifying MBA Expression Using E-Graphs

TL;DR

MBA obfuscation blends Boolean and arithmetic forms and is a major challenge in malware reverse engineering. The authors propose an e-graph based approach, implemented with the Rust egg library, to represent multiple semantically equivalent MBA expressions in e-classes and apply rewrite rules to obtain simpler forms. They formalize MBA expressions as linear and polynomial , and build a rule-based rewriting pipeline with preprocessing to distinguish subtraction from negation. Experiments show sub-second simplification times and competitive success rates across multiple datasets, with faster performance than prior tools like SSPAM and GAMBA, indicating the approach is scalable and practical for deobfuscation workflows.

Abstract

Code obfuscation involves the addition of meaningless code or the complication of existing code in order to make a program difficult to reverse engineer. In recent years, MBA (Mixed Boolean Arithmetic) obfuscation has been applied to virus and malware code to impede expert analysis. Among the various obfuscation techniques, Mixed Boolean Arithmetic (MBA) obfuscation is considered the most challenging to decipher using existing code deobfuscation techniques. In this paper, we have attempted to simplify the MBA expression. We use an e-graph data structure to efficiently hold multiple expressions of the same semantics to systematically rewrite terms and find simpler expressions. The preliminary experimental result shows that our e-graph based MBA deobfuscation approach works faster with reasonable performance than other approaches do.
Paper Structure (8 sections, 4 equations, 1 figure, 1 table)

This paper contains 8 sections, 4 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: The figure on the left is an e-graph in the initial state representing the expression of "(a * 2) / (a << 1)", and after that, when the "a << 1" rewriting rule is applied, the figure on the right it is expressed in e-graph. The dotted line represents an e-class having the same semantic, and the solid circles represent nodes. This makes it easy to find the same e-class for parts that have the same semantics by the rewriting rule.