Table of Contents
Fetching ...

Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making

Yihan Wang, Qiao Yan, Zhenghao Xing, Lihao Liu, Junjun He, Chi-Wing Fu, Xiaowei Hu, Pheng-Ann Heng

TL;DR

The paper identifies Silent Agreement as a key bottleneck in medical multi-agent LLM frameworks and introduces the Catfish Agent, a role-based dissent mechanism designed to inject structured challenges. It implements two core interventions—complexity-aware engagement and tone-calibrated dissent—to adapt to case difficulty and consensus strength, respectively. Through extensive experiments on nine medical Q&A and three medical VQA benchmarks, the Catfish framework achieves substantial gains over single- and multi-agent baselines, including GPT-4o and DeepSeek-R1, with notable reductions in premature consensus and improved diagnostic reasoning. The work demonstrates the practical impact of deliberate disagreement in high-stakes medical decision making and outlines future directions for efficient coordination in multi-agent reasoning systems.

Abstract

Large language models (LLMs) have demonstrated strong potential in clinical question answering, with recent multi-agent frameworks further improving diagnostic accuracy via collaborative reasoning. However, we identify a recurring issue of Silent Agreement, where agents prematurely converge on diagnoses without sufficient critical analysis, particularly in complex or ambiguous cases. We present a new concept called Catfish Agent, a role-specialized LLM designed to inject structured dissent and counter silent agreement. Inspired by the ``catfish effect'' in organizational psychology, the Catfish Agent is designed to challenge emerging consensus to stimulate deeper reasoning. We formulate two mechanisms to encourage effective and context-aware interventions: (i) a complexity-aware intervention that modulates agent engagement based on case difficulty, and (ii) a tone-calibrated intervention articulated to balance critique and collaboration. Evaluations on nine medical Q&A and three medical VQA benchmarks show that our approach consistently outperforms both single- and multi-agent LLMs frameworks, including leading commercial models such as GPT-4o and DeepSeek-R1.

Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making

TL;DR

The paper identifies Silent Agreement as a key bottleneck in medical multi-agent LLM frameworks and introduces the Catfish Agent, a role-based dissent mechanism designed to inject structured challenges. It implements two core interventions—complexity-aware engagement and tone-calibrated dissent—to adapt to case difficulty and consensus strength, respectively. Through extensive experiments on nine medical Q&A and three medical VQA benchmarks, the Catfish framework achieves substantial gains over single- and multi-agent baselines, including GPT-4o and DeepSeek-R1, with notable reductions in premature consensus and improved diagnostic reasoning. The work demonstrates the practical impact of deliberate disagreement in high-stakes medical decision making and outlines future directions for efficient coordination in multi-agent reasoning systems.

Abstract

Large language models (LLMs) have demonstrated strong potential in clinical question answering, with recent multi-agent frameworks further improving diagnostic accuracy via collaborative reasoning. However, we identify a recurring issue of Silent Agreement, where agents prematurely converge on diagnoses without sufficient critical analysis, particularly in complex or ambiguous cases. We present a new concept called Catfish Agent, a role-specialized LLM designed to inject structured dissent and counter silent agreement. Inspired by the ``catfish effect'' in organizational psychology, the Catfish Agent is designed to challenge emerging consensus to stimulate deeper reasoning. We formulate two mechanisms to encourage effective and context-aware interventions: (i) a complexity-aware intervention that modulates agent engagement based on case difficulty, and (ii) a tone-calibrated intervention articulated to balance critique and collaboration. Evaluations on nine medical Q&A and three medical VQA benchmarks show that our approach consistently outperforms both single- and multi-agent LLMs frameworks, including leading commercial models such as GPT-4o and DeepSeek-R1.

Paper Structure

This paper contains 28 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: An example clinical misdiagnosis case resulted from Silent Agreement. Although the agents initially select different options, they remained silent in subsequent discussion, resulting in the misdiagnosis. Our method actively disrupts such silent agreement with the designated catfish agent in multi-agent collaborative reasoning and successfully produces the correct outcome.
  • Figure 2: Overview of the reasoning process for an advanced case. (i) the system routes the clinical question through a complexity-aware Moderator, which classifies it as advanced and activates three expert teams, each consisting of a leader and two members; (ii) within each team, the leader assigns specific subtasks, and members respond independently based on their expertise; (iii) a Catfish Agent monitors the discussion and selectively intervenes by critiquing flawed assumptions or incomplete reasoning. All team members are required to respond to these challenges; (iv) after internal discussion, each team leader finalizes the team's answer and forwards it to the next team for iterative refinement; and (v) once all teams have contributed, the Moderator synthesizes the collective reasoning and, if needed, introduces an additional Catfish Agent for final diagnosis.
  • Figure 3: Advanced case example. Interventions from the Catfish Agent leads to a correct decision. Upon detecting premature consensus and inaccurate analysis, the Catfish Agent (as a nephrologist) raises specific concerns, prompting Teams and the Moderator to re-evaluate and ultimately select the correct option.
  • Figure 4: Intermediate case example illustrating interventions from the Catfish Agent during a multi-round debate. Assigned a fixed domain role, the Catfish Agent monitors team dynamics and raises structured dissent to prevent Silent Agreement, enhancing diagnostic robustness.
  • Figure 5: A basic-level case where the Catfish Agent identifies an oversight in the initial diagnosis and successfully prompts a correction, leading to the correct final decision.
  • ...and 4 more figures