Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs

Sayan Ghosh; Shahzaib Saqib Warraich; Dhruv Tarsadiya; Gregory Yauney; Swabha Swayamdipta

Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs

Sayan Ghosh, Shahzaib Saqib Warraich, Dhruv Tarsadiya, Gregory Yauney, Swabha Swayamdipta

TL;DR

This work introduces Consensus Graphs (ConGrs), a DAG-based representation that captures shared and divergent information across multiple LM responses to a single prompt. ConGrs are built by lexical alignment (Needleman–Wunsch) to identify anchor spans (consensus nodes) and a secondary LM to group semantically equivalent divergences (disagreement nodes). The authors present two decoding strategies—consensus decoding (aggregation) and guided self-verification (intervention)—and demonstrate improvements in factuality for long-form generation, abstention control in refusals, and reasoning performance on MATH/AIME tasks, often with substantial cost savings. By leveraging response variation as an epistemic signal, ConGrs offer a flexible, metadata-free approach to synthesize more reliable LM outputs across diverse tasks.

Abstract

Language models can be sampled multiple times to access the distribution underlying their responses, but existing methods cannot efficiently synthesize rich epistemic signals across different long-form responses. We introduce Consensus Graphs (ConGrs), a flexible DAG-based data structure that represents shared information, as well as semantic variation in a set of sampled LM responses to the same prompt. We construct ConGrs using a light-weight lexical sequence alignment algorithm from bioinformatics, supplemented by the targeted usage of a secondary LM judge. Further, we design task-dependent decoding methods to synthesize a single, final response from our ConGr data structure. Our experiments show that synthesizing responses from ConGrs improves factual precision on two biography generation tasks by up to 31% over an average response and reduces reliance on LM judges by more than 80% compared to other methods. We also use ConGrs for three refusal-based tasks requiring abstention on unanswerable queries and find that abstention rate is increased by up to 56%. We apply our approach to the MATH and AIME reasoning tasks and find an improvement over self-verification and majority vote baselines by up to 6 points of accuracy. We show that ConGrs provide a flexible method for capturing variation in LM responses and using the epistemic signals provided by response variation to synthesize more effective responses.

Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs

TL;DR

Abstract

Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)