Hound: Relation-First Knowledge Graphs for Complex-System Reasoning in Security Audits

Bernhard Mueller

Hound: Relation-First Knowledge Graphs for Complex-System Reasoning in Security Audits

Bernhard Mueller

TL;DR

Hound introduces a relation-first knowledge-graph engine for security audits that integrates flexible, multi-scale graphs with a persistent hypothesis-belief system to enable precise cross-component reasoning. It anchors retrieval on graph-backed evidence and uses a two-phase planning workflow (Coverage followed by Intuition) to guide scalable analysis across large codebases, with a formal data model and reference-driven retrieval loop. Benchmarks on ScaBench demonstrate notable gains in recall and F1 over a baseline analyzer, validating the effectiveness of graph-first reasoning and hypothesis management for complex systems. The work also delivers reproducible tooling and data to support adoption, while highlighting future directions to reduce false positives and strengthen verification. The combination of persistent beliefs, exact code slicing, and adaptive planning offers practical impact for rigorous, scalable security audits across heterogeneous software ecosystems.

Abstract

Hound introduces a relation-first graph engine that improves system-level reasoning across interrelated components in complex codebases. The agent designs flexible, analyst-defined views with compact annotations (e.g., monetary/value flows, authentication/authorization roles, call graphs, protocol invariants) and uses them to anchor exact retrieval: for any question, it loads precisely the code that matters (often across components) so it can zoom out to system structure and zoom in to the decisive lines. A second contribution is a persistent belief system: long-lived vulnerability hypotheses whose confidence is updated as evidence accrues. The agent employs coverage-versus-intuition planning and a QA finalizer to confirm or reject hypotheses. On a five-project subset of ScaBench[1], Hound improves recall and F1 over a baseline LLM analyzer (micro recall 31.2% vs. 8.3%; F1 14.2% vs. 9.8%) with a modest precision trade-off. We attribute these gains to flexible, relation-first graphs that extend model understanding beyond call/dataflow to abstract aspects, plus the hypothesis-centric loop; code and artifacts are released to support reproduction.

Hound: Relation-First Knowledge Graphs for Complex-System Reasoning in Security Audits

TL;DR

Abstract

Hound: Relation-First Knowledge Graphs for Complex-System Reasoning in Security Audits

TL;DR

Abstract

Paper Structure

Table of Contents