Natural Language Decompositions of Implicit Content Enable Better Text Representations
Alexander Hoyle, Rupak Sarkar, Pranav Goel, Philip Resnik
TL;DR
This work introduces inferential decompositions: explicit, language-based representations of both explicit and implicit propositions related to utterances, generated by exemplar-guided prompting of large language models. By validating plausibility with human judgments and testing across domains (public opinion clustering and legislator co-voting), the approach demonstrates that incorporating implicit content can improve interpretability and some downstream tasks, particularly in theme discovery and socially-relevant analyses. While offering gains in semantic similarity on argument- and Twitter-based STS tasks and enabling richer narrative discovery, the method shows mixed results on standard STS benchmarks, underscoring task-dependent benefits. Overall, treating implicit content as a first-class citizen in NLP enables more nuanced analyses of text as data and holds promise for social science applications requiring deeper interpretation of utterances.
Abstract
When people interpret text, they rely on inferences that go beyond the observed language itself. Inspired by this observation, we introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed, then validate the plausibility of the generated content via human judgments. Incorporating these explicit representations of implicit content proves useful in multiple problem settings that involve the human interpretation of utterances: assessing the similarity of arguments, making sense of a body of opinion data, and modeling legislative behavior. Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP and particularly its applications to social science.
