Multi-role Consensus through LLMs Discussions for Vulnerability Detection
Zhenyu Mao, Jialong Li, Dongming Jin, Munan Li, Kenji Tei
TL;DR
The paper tackles vulnerability detection by moving beyond single-role LLM prompts to a multi-role framework that simulates a real-life code review with tester and developer roles. It introduces a structured workflow with independent initial judgments, an iterative discussion phase using a pose query-deduce response-relay insight loop to reach consensus, and a final verdict that emphasizes the tester's primary responsibility. In a preliminary evaluation on a C/C++ vulnerability dataset across FC, AE, AU, and PU categories, the multi-role approach yields notable gains in precision (+13.48%), recall (+18.25%), and F1 (+16.13%), at the cost of higher token usage (+484%). This demonstrates the value of role diversity and dialogic prompting for vulnerability detection and suggests practical benefits for automated QA pipelines, with future work focusing on in-context learning to further enhance collaboration.
Abstract
Recent advancements in large language models (LLMs) have highlighted the potential for vulnerability detection, a crucial component of software quality assurance. Despite this progress, most studies have been limited to the perspective of a single role, usually testers, lacking diverse viewpoints from different roles in a typical software development life-cycle, including both developers and testers. To this end, this paper introduces a multi-role approach to employ LLMs to act as different roles simulating a real-life code review process and engaging in discussions toward a consensus on the existence and classification of vulnerabilities in the code. Preliminary evaluation of this approach indicates a 13.48% increase in the precision rate, an 18.25% increase in the recall rate, and a 16.13% increase in the F1 score.
