Argus: A Multi-Agent Sensitive Information Leakage Detection Framework Based on Hierarchical Reference Relationships
Bin Wang, Hui Li, Liyang Zhang, Qijia Zhuang, Ao Yang, Dong Zhang, Xijun Luo, Bing Lin
TL;DR
This paper tackles the challenge of false-positive-heavy sensitive-information leaks in code repositories by introducing Argus, a multi-agent framework that performs three-level contextual semantic analysis (intrinsic semantics, immediate context, and project-wide references) to detect leaks. It combines role specialization (Initial Screening, Basic and Advanced Check Agents) and a three-tier shared memory pool to coordinate evidence gathering and decision making, achieving state-of-the-art performance on two new benchmarks. On CommonLeak, Argus reports $94.86\%$ accuracy, $96.36\%$ precision, and $94.64\%$ recall (F1 $=0.955$), while cost remains modest at $2.21 across 97 real repositories; ablation shows substantial gains from all three levels. The work also provides TrustedFalseSecrets to assess false-positive filtering, demonstrates robustness across languages and secret types, and offers practical deployment guidance and open-source resources for broader adoption.
Abstract
Sensitive information leakage in code repositories has emerged as a critical security challenge. Traditional detection methods that rely on regular expressions, fingerprint features, and high-entropy calculations often suffer from high false-positive rates. This not only reduces detection efficiency but also significantly increases the manual screening burden on developers. Recent advances in large language models (LLMs) and multi-agent collaborative architectures have demonstrated remarkable potential for tackling complex tasks, offering a novel technological perspective for sensitive information detection. In response to these challenges, we propose Argus, a multi-agent collaborative framework for detecting sensitive information. Argus employs a three-tier detection mechanism that integrates key content, file context, and project reference relationships to effectively reduce false positives and enhance overall detection accuracy. To comprehensively evaluate Argus in real-world repository environments, we developed two new benchmarks, one to assess genuine leak detection capabilities and another to evaluate false-positive filtering performance. Experimental results show that Argus achieves up to 94.86% accuracy in leak detection, with a precision of 96.36%, recall of 94.64%, and an F1 score of 0.955. Moreover, the analysis of 97 real repositories incurred a total cost of only 2.2$. All code implementations and related datasets are publicly available at https://github.com/TheBinKing/Argus-Guard for further research and application.
