Table of Contents
Fetching ...

CASCADE: LLM-Powered JavaScript Deobfuscator at Google

Shan Jiang, Pranoy Kovuri, David Tao, Zhixun Tan

TL;DR

CASCADE is a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR).

Abstract

Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR). By employing Gemini to identify critical prelude functions, the foundational components underlying the most prevalent obfuscation techniques, and leveraging JSIR for subsequent code transformations, CASCADE effectively recovers semantic elements like original strings and API names, and reveals original program behaviors. This method overcomes limitations of existing static and dynamic deobfuscation techniques, eliminating hundreds to thousands of hardcoded rules while achieving reliability and flexibility. CASCADE is already deployed in Google's production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency and reducing reverse engineering efforts.

CASCADE: LLM-Powered JavaScript Deobfuscator at Google

TL;DR

CASCADE is a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR).

Abstract

Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR). By employing Gemini to identify critical prelude functions, the foundational components underlying the most prevalent obfuscation techniques, and leveraging JSIR for subsequent code transformations, CASCADE effectively recovers semantic elements like original strings and API names, and reveals original program behaviors. This method overcomes limitations of existing static and dynamic deobfuscation techniques, eliminating hundreds to thousands of hardcoded rules while achieving reliability and flexibility. CASCADE is already deployed in Google's production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency and reducing reverse engineering efforts.

Paper Structure

This paper contains 21 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Hello World obfuscated by Obfuscator.IO with the default (the lowest level) configuration
  • Figure 2: CASCADE Deobfuscator
  • Figure 3: Simplified Illustration of String Obfuscation
  • Figure 4: Slight changes cause rule-based pattern match to fail, left means Webcrack deobfuscation is correct, right means Webcrack cannot deobfuscate
  • Figure 5: Prompt Template For Prelude Detection
  • ...and 7 more figures