MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning
Inderjeet Nair, Lu Wang
TL;DR
This work tackles structured commonsense reasoning via graphs generated by LLMs, addressing error propagation and single-sample limitations. It introduces MIDGARD, an MDL-guided aggregation framework that merges multiple graph samples into a single DAG by minimizing the expected description length of samples relative to a hypothesized graph, effectively promoting consistently observed edges/nodes. The approach demonstrates robust improvements across argument structure extraction, explanation graph generation, script planning, and semantic graph generation on eight benchmarks, using both GPT-3.5-turbo and Code-Llama, with DAG constraints playing a crucial role in maintaining valid graph structures. However, MIDGARD incurs higher computational cost due to multiple samples and ILP-based DAG enforcement, and its performance can depend on hyperparameters and the variability of the underlying LLM outputs; ethical considerations regarding hallucination and biased content are acknowledged.
Abstract
We study the task of conducting structured reasoning as generating a reasoning graph from natural language input using large language models (LLMs). Previous approaches have explored various prompting schemes, yet they suffer from error propagation due to the autoregressive nature and single-pass-based decoding, which lack error correction capability. Additionally, relying solely on a single sample may result in the omission of true nodes and edges. To counter this, we draw inspiration from self-consistency (SC), which involves sampling a diverse set of reasoning chains and taking the majority vote as the final answer. To tackle the substantial challenge of applying SC on generated graphs, we propose MIDGARD (MInimum Description length Guided Aggregation of Reasoning in Directed acyclic graph) that leverages Minimum Description Length (MDL)-based formulation to identify consistent properties among the different graph samples generated by an LLM. This formulation helps reject properties that appear in only a few samples, which are likely to be erroneous, while enabling the inclusion of missing elements without compromising precision. Our method demonstrates superior performance than comparisons across various structured reasoning tasks, including argument structure extraction, explanation graph generation, inferring dependency relations among actions for everyday tasks, and semantic graph generation from natural texts.
