Table of Contents
Fetching ...

ComplLLM: Fine-tuning LLMs to Discover Complementary Signals for Decision-making

Ziyang Guo, Yifan Wu, Jason Hartline, Kenneth Holstein, Jessica Hullman

TL;DR

ComplLLM tackles the challenge of identifying complementary information in multi-agent decision workflows by framing a decision-theoretic objective: surface signals from supervisor information that improve decision quality beyond an upstream model's recommendation. It introduces a three-stage approach—estimating the data-generating process, supervised fine-tuning with generated complementary signals, and reinforcement learning via Group Relative Policy Optimization—to elicit actionable, interpretable signals. The framework is validated across synthetic data and three real-world domains (radiology, content moderation, and scientific paper reviewing), demonstrating reliable recovery of complementary signals and improved downstream decision performance, including qualitative expert feedback in medicine. The work advances explainable AI for collaboration by shifting explanations from justification toward surfacing decision-relevant, complementary cues that decision-makers should consider in conjunction with existing recommendations.

Abstract

Multi-agent decision pipelines can outperform single agent workflows when complementarity holds, i.e., different agents bring unique information to the table to inform a final decision. We propose ComplLLM, a post-training framework based on decision theory that fine-tunes a decision-assistant LLM using complementary information as reward to output signals that complement existing agent decisions. We validate ComplLLM on synthetic and real-world tasks involving domain experts, demonstrating how the approach recovers known complementary information and produces plausible explanations of complementary signals to support downstream decision-makers.

ComplLLM: Fine-tuning LLMs to Discover Complementary Signals for Decision-making

TL;DR

ComplLLM tackles the challenge of identifying complementary information in multi-agent decision workflows by framing a decision-theoretic objective: surface signals from supervisor information that improve decision quality beyond an upstream model's recommendation. It introduces a three-stage approach—estimating the data-generating process, supervised fine-tuning with generated complementary signals, and reinforcement learning via Group Relative Policy Optimization—to elicit actionable, interpretable signals. The framework is validated across synthetic data and three real-world domains (radiology, content moderation, and scientific paper reviewing), demonstrating reliable recovery of complementary signals and improved downstream decision performance, including qualitative expert feedback in medicine. The work advances explainable AI for collaboration by shifting explanations from justification toward surfacing decision-relevant, complementary cues that decision-makers should consider in conjunction with existing recommendations.

Abstract

Multi-agent decision pipelines can outperform single agent workflows when complementarity holds, i.e., different agents bring unique information to the table to inform a final decision. We propose ComplLLM, a post-training framework based on decision theory that fine-tunes a decision-assistant LLM using complementary information as reward to output signals that complement existing agent decisions. We validate ComplLLM on synthetic and real-world tasks involving domain experts, demonstrating how the approach recovers known complementary information and produces plausible explanations of complementary signals to support downstream decision-makers.
Paper Structure (53 sections, 5 equations, 28 figures, 8 tables, 1 algorithm)

This paper contains 53 sections, 5 equations, 28 figures, 8 tables, 1 algorithm.

Figures (28)

  • Figure 1: A graphical representation of how our work goes beyond simply identifying concepts that appear in text or signals that are predictive of a target state, by also ensuring that signals complement the existing recommendation. The diagonal stripes (twill pattern) represent the target of the corresponding methods.
  • Figure 2: The "two agents" setting in our framework. The recommending agent makes a recommendation $Z$ based on their own features $X$, and the supervisor agent aggregates their own information $T$ with $Z$ to make a final decision $D$.
  • Figure 3: Expected accuracy given the extracted signals and the agent's recommendation by each method. Dashed lines represent agent decision accuracy and the accuracy of the benchmark method. Error bars depict bootstrapped 95% confidence intervals (N=5000).
  • Figure 4: Accuracy of LLM paper review decisions on the Review5K dataset. Error bars depict bootstrapped 95% confidence intervals (N=5000).
  • Figure 9: Experimental results for ComplLLM with only SFT. Dashed lines represent agent decision accuracy and the accuracy of the benchmark method. Error bars depict bootstrapped 95% confidence intervals (N=5000).
  • ...and 23 more figures