Table of Contents
Fetching ...

Implicit Patterns in LLM-Based Binary Analysis

Qiang Li, XiangRui Zhang, Haining Wang

Abstract

Binary vulnerability analysis is increasingly performed by LLM-based agents in an iterative, multi-pass manner, with the model as the core decision-maker. However, how such systems organize exploration over hundreds of reasoning steps remains poorly understood, due to limited context windows and implicit token-level behaviors. We present the first large-scale, trace-level study showing that multi-pass LLM reasoning gives rise to structured, token-level implicit patterns. Analyzing 521 binaries with 99,563 reasoning steps, we identify four dominant patterns: early pruning, path-dependent lock-in, targeted backtracking, and knowledge-guided prioritization that emerge implicitly from reasoning traces. These token-level implicit patterns serve as an abstraction of LLM reasoning: instead of explicit control-flow or predefined heuristics, exploration is organized through implicit decisions regulating path selection, commitment, and revision. Our analysis shows these patterns form a stable, structured system with distinct temporal roles and measurable characteristics. Our results provide the first systematic characterization of LLM-driven binary analysis and a foundation for more reliable analysis systems.

Implicit Patterns in LLM-Based Binary Analysis

Abstract

Binary vulnerability analysis is increasingly performed by LLM-based agents in an iterative, multi-pass manner, with the model as the core decision-maker. However, how such systems organize exploration over hundreds of reasoning steps remains poorly understood, due to limited context windows and implicit token-level behaviors. We present the first large-scale, trace-level study showing that multi-pass LLM reasoning gives rise to structured, token-level implicit patterns. Analyzing 521 binaries with 99,563 reasoning steps, we identify four dominant patterns: early pruning, path-dependent lock-in, targeted backtracking, and knowledge-guided prioritization that emerge implicitly from reasoning traces. These token-level implicit patterns serve as an abstraction of LLM reasoning: instead of explicit control-flow or predefined heuristics, exploration is organized through implicit decisions regulating path selection, commitment, and revision. Our analysis shows these patterns form a stable, structured system with distinct temporal roles and measurable characteristics. Our results provide the first systematic characterization of LLM-driven binary analysis and a foundation for more reliable analysis systems.
Paper Structure (42 sections, 2 equations, 12 figures, 11 tables, 4 algorithms)

This paper contains 42 sections, 2 equations, 12 figures, 11 tables, 4 algorithms.

Figures (12)

  • Figure 1: (Left) One-pass paradigm: static analysis constructs a global program representation and vulnerability reasoning operates over this fixed view. (Right) Iterative paradigm: reasoning interleaves with repeated static analysis operations, proceeding incrementally through multiple tool invocations.
  • Figure 2: An example of step-by-step LLM-driven binary analysis, illustrating how reasoning states evolve through repeated interactions with external analysis tools.
  • Figure 3: Dataset overview: the relationship between the binary corpus, analysis sessions, reasoning traces, and output metadata.
  • Figure 4: Pattern 1: pruning behavior in traces—candidate paths discarded early, rarely revisited.
  • Figure 5: Pattern 2: lock-in behavior in traces—sustained reasoning within the same context, limited exploration of alternatives.
  • ...and 7 more figures