Simplifying MBA Expression Using E-Graphs
Seoksu Lee, Hyeongchang Jeon, Eun-Sun Cho
TL;DR
MBA obfuscation blends Boolean and arithmetic forms and is a major challenge in malware reverse engineering. The authors propose an e-graph based approach, implemented with the Rust egg library, to represent multiple semantically equivalent MBA expressions in e-classes and apply rewrite rules to obtain simpler forms. They formalize MBA expressions as linear $\\sum_{i \\in I} a_i e_i$ and polynomial $\\sum_{i \\in I} a_i (\\prod_{j \\in J} e_{i,j})$, and build a rule-based rewriting pipeline with preprocessing to distinguish subtraction from negation. Experiments show sub-second simplification times and competitive success rates across multiple datasets, with faster performance than prior tools like SSPAM and GAMBA, indicating the approach is scalable and practical for deobfuscation workflows.
Abstract
Code obfuscation involves the addition of meaningless code or the complication of existing code in order to make a program difficult to reverse engineer. In recent years, MBA (Mixed Boolean Arithmetic) obfuscation has been applied to virus and malware code to impede expert analysis. Among the various obfuscation techniques, Mixed Boolean Arithmetic (MBA) obfuscation is considered the most challenging to decipher using existing code deobfuscation techniques. In this paper, we have attempted to simplify the MBA expression. We use an e-graph data structure to efficiently hold multiple expressions of the same semantics to systematically rewrite terms and find simpler expressions. The preliminary experimental result shows that our e-graph based MBA deobfuscation approach works faster with reasonable performance than other approaches do.
