Table of Contents
Fetching ...

LLMs as Firmware Experts: A Runtime-Grown Tree-of-Agents Framework

Xiangrui Zhang, Zeyu Chen, Haining Wang, Qiang Li

TL;DR

This paper addresses the challenge of applying LLM-based agents to firmware security, where vast, heterogeneous firmware with implicit dependencies demands long-horizon reasoning. It introduces FirmHive, a runtime Tree of Agents built via recursive delegation and coordinated by a Persistent Knowledge Hub, enabling autonomous, cross-file vulnerability analysis without handcrafted pipelines. Empirical results on real firmware show FirmHive achieving substantially more alerts and deeper reasoning than both LLM baselines and state-of-the-art static tools, with a precision of 71% on a representative sample. The approach offers a scalable, adaptable framework that improves security outcomes in embedded firmware analysis and opens avenues for further tool integration and deployment in practice.

Abstract

Large Language Models (LLMs) and their agent systems have recently demonstrated strong potential in automating code reasoning and vulnerability detection. However, when applied to large-scale firmware, their performance degrades due to the binary nature of firmware, complex dependency structures, and heterogeneous components. To address this challenge, this paper presents FIRMHIVE, a recursive agent hive that enables LLMs to act as autonomous firmware security analysts. FIRMHIVE introduces two key mechanisms: (1) transforming delegation into a per-agent, executable primitive and (2) constructing a runtime Tree of Agents (ToA) for decentralized coordination. We evaluate FIRMHIVE using real-world firmware images obtained from publicly available datasets, covering five representative security analysis tasks. Compared with existing LLM-agent baselines, FIRMHIVE performs deeper (about 16x more reasoning steps) and broader (about 2.3x more files inspected) cross-file exploration, resulting in about 5.6x more alerts per firmware. Compared to state-of-the-art (SOTA) security tools, FIRMHIVE identifies about 1.5x more vulnerabilities (1,802 total) and achieves 71% precision, representing significant improvements in both yield and fidelity.

LLMs as Firmware Experts: A Runtime-Grown Tree-of-Agents Framework

TL;DR

This paper addresses the challenge of applying LLM-based agents to firmware security, where vast, heterogeneous firmware with implicit dependencies demands long-horizon reasoning. It introduces FirmHive, a runtime Tree of Agents built via recursive delegation and coordinated by a Persistent Knowledge Hub, enabling autonomous, cross-file vulnerability analysis without handcrafted pipelines. Empirical results on real firmware show FirmHive achieving substantially more alerts and deeper reasoning than both LLM baselines and state-of-the-art static tools, with a precision of 71% on a representative sample. The approach offers a scalable, adaptable framework that improves security outcomes in embedded firmware analysis and opens avenues for further tool integration and deployment in practice.

Abstract

Large Language Models (LLMs) and their agent systems have recently demonstrated strong potential in automating code reasoning and vulnerability detection. However, when applied to large-scale firmware, their performance degrades due to the binary nature of firmware, complex dependency structures, and heterogeneous components. To address this challenge, this paper presents FIRMHIVE, a recursive agent hive that enables LLMs to act as autonomous firmware security analysts. FIRMHIVE introduces two key mechanisms: (1) transforming delegation into a per-agent, executable primitive and (2) constructing a runtime Tree of Agents (ToA) for decentralized coordination. We evaluate FIRMHIVE using real-world firmware images obtained from publicly available datasets, covering five representative security analysis tasks. Compared with existing LLM-agent baselines, FIRMHIVE performs deeper (about 16x more reasoning steps) and broader (about 2.3x more files inspected) cross-file exploration, resulting in about 5.6x more alerts per firmware. Compared to state-of-the-art (SOTA) security tools, FIRMHIVE identifies about 1.5x more vulnerabilities (1,802 total) and achieves 71% precision, representing significant improvements in both yield and fidelity.

Paper Structure

This paper contains 30 sections, 5 equations, 11 figures, 13 tables.

Figures (11)

  • Figure 1: LLM as expert firmware analyst workflow. Red circles denote the LLM's reasoning turns; orange boxes denote tool invocations and their outputs.
  • Figure 2: The delegation is viewed as a black box: the parent agent focuses on what needs to be delegated rather than how delegation is executed.
  • Figure 3: A runtime grown ToA: the root agent recursively spawns sub-agents via delegation and forms a tree structure.
  • Figure 4: FirmHive's architecture: the purple region represents the RDE and PKH components; the yellow region represents the firmware environment, and the green region represents the runtime growth ToA.
  • Figure 5: Recursive delegation constructs a tree: the red arrow represents the delegation, and the blue arrow indicates the return of results.
  • ...and 6 more figures