Table of Contents
Fetching ...

OBsmith: Testing JavaScript Obfuscator using LLM-powered sketching

Shan Jiang, Chenguang Zhu, Sarfraz Khurshid

TL;DR

OBsmith tackles the lack of semantic-preservation testing in JavaScript obfuscation by introducing an LLM-powered, sketch-based framework that pairs generated and extracted sketches with a program generator and instrumentation to perform ground-truth differential testing and metamorphic testing. By using the original unobfuscated program as the oracle and enabling multiple obfuscators/configurations, OBsmith uncovers correctness bugs that standard fuzzers miss, including silent miscompilations and failure-inducing transformations. The results show OBsmith exposing 11 previously unknown bugs across two popular obfuscators, and an ablation study indicates that most components contribute unique bug classes, with metamorphic testing needing domain-specific relations. The work demonstrates a practical path toward automated testing and quality assurance for obfuscators and similar semantic-preserving pipelines, while offering guidance on balancing obfuscation presets against performance overhead.

Abstract

JavaScript obfuscators are widely deployed to protect intellectual property and resist reverse engineering, yet their correctness has been largely overlooked compared to performance and resilience. Existing evaluations typically measure resistance to deobfuscation, leaving the critical question of whether obfuscators preserve program semantics unanswered. Incorrect transformations can silently alter functionality, compromise reliability, and erode security-undermining the very purpose of obfuscation. To address this gap, we present OBsmith, a novel framework to systematically test JavaScript obfuscators using large language models (LLMs). OBsmith leverages LLMs to generate program sketches abstract templates capturing diverse language constructs, idioms, and corner cases-which are instantiated into executable programs and subjected to obfuscation under different configurations. Besides LLM-powered sketching, OBsmith also employs a second source: automatic extraction of sketches from real programs. This extraction path enables more focused testing of project specific features and lets developers inject domain knowledge into the resulting test cases. OBsmith uncovers 11 previously unknown correctness bugs. Under an equal program budget, five general purpose state-of-the-art JavaScript fuzzers (FuzzJIT, Jsfunfuzz, Superion, DIE, Fuzzilli) failed to detect these issues, highlighting OBsmith's complementary focus on obfuscation induced misbehavior. An ablation shows that all components except our generic MRs contribute to at least one bug class; the negative MR result suggests the need for obfuscator-specific metamorphic relations. Our results also seed discussion on how to balance obfuscation presets and performance cost. We envision OBsmith as an important step towards automated testing and quality assurance of obfuscators and other semantic-preserving toolchains.

OBsmith: Testing JavaScript Obfuscator using LLM-powered sketching

TL;DR

OBsmith tackles the lack of semantic-preservation testing in JavaScript obfuscation by introducing an LLM-powered, sketch-based framework that pairs generated and extracted sketches with a program generator and instrumentation to perform ground-truth differential testing and metamorphic testing. By using the original unobfuscated program as the oracle and enabling multiple obfuscators/configurations, OBsmith uncovers correctness bugs that standard fuzzers miss, including silent miscompilations and failure-inducing transformations. The results show OBsmith exposing 11 previously unknown bugs across two popular obfuscators, and an ablation study indicates that most components contribute unique bug classes, with metamorphic testing needing domain-specific relations. The work demonstrates a practical path toward automated testing and quality assurance for obfuscators and similar semantic-preserving pipelines, while offering guidance on balancing obfuscation presets against performance overhead.

Abstract

JavaScript obfuscators are widely deployed to protect intellectual property and resist reverse engineering, yet their correctness has been largely overlooked compared to performance and resilience. Existing evaluations typically measure resistance to deobfuscation, leaving the critical question of whether obfuscators preserve program semantics unanswered. Incorrect transformations can silently alter functionality, compromise reliability, and erode security-undermining the very purpose of obfuscation. To address this gap, we present OBsmith, a novel framework to systematically test JavaScript obfuscators using large language models (LLMs). OBsmith leverages LLMs to generate program sketches abstract templates capturing diverse language constructs, idioms, and corner cases-which are instantiated into executable programs and subjected to obfuscation under different configurations. Besides LLM-powered sketching, OBsmith also employs a second source: automatic extraction of sketches from real programs. This extraction path enables more focused testing of project specific features and lets developers inject domain knowledge into the resulting test cases. OBsmith uncovers 11 previously unknown correctness bugs. Under an equal program budget, five general purpose state-of-the-art JavaScript fuzzers (FuzzJIT, Jsfunfuzz, Superion, DIE, Fuzzilli) failed to detect these issues, highlighting OBsmith's complementary focus on obfuscation induced misbehavior. An ablation shows that all components except our generic MRs contribute to at least one bug class; the negative MR result suggests the need for obfuscator-specific metamorphic relations. Our results also seed discussion on how to balance obfuscation presets and performance cost. We envision OBsmith as an important step towards automated testing and quality assurance of obfuscators and other semantic-preserving toolchains.

Paper Structure

This paper contains 74 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: A simplified sketch with holes (left), the corresponding concrete program that is input to obfuscators (middle), and the output of obfuscated program created by JS-Confuser which shows its faulty behavior (right).
  • Figure 2: OBsmith overall workflow
  • Figure 3: LLM-powered sketch generation with feedback loop (left) and corresponding location in OBsmith framework
  • Figure 4: Program generation workflow (left) and differential testing with ground truth workflow (right)
  • Figure 5: OBsmith prompt for LLM-based sketch generation