Promptly Predicting Structures: The Return of Inference

Maitrey Mehta; Valentina Pyatkin; Vivek Srikumar

Promptly Predicting Structures: The Return of Inference

Maitrey Mehta, Valentina Pyatkin, Vivek Srikumar

TL;DR

This work addresses structured prediction in NLP under zero- and few-shot regimes by pairing prompt-based local decisions with global inference to enforce structural constraints. The authors formalize unary potentials through prompts as $P(Y|X,Q) = \prod_i P(y_i|X,q_i)$ and frame structure as a constrained optimization problem $Y^* = \max_Y \prod_i P(y_i|X,q_i)$ subject to structural validity, solvable via methods like ILP or shortest-path searches. They instantiate the framework on Semantic Role Labeling and Coreference Resolution across five datasets, showing that enforcing consistency not only yields valid outputs but also improves task performance versus unconstrained prompts. The results demonstrate that constraint-driven inference reduces output invalidity, enhances cross-task robustness, and can compensate for smaller model sizes, with notable gains from iterative prompting and few-shot setups. The approach offers a practical pathway to reliable structured predictions with minimal labeled data, improving applicability of LLMs to linguistically structured tasks in diverse domains.

Abstract

Prompt-based methods have been used extensively across NLP to build zero- and few-shot label predictors. Many NLP tasks are naturally structured: that is, their outputs consist of multiple labels which constrain each other. Annotating data for such tasks can be cumbersome. Can the promise of the prompt-based paradigm be extended to such structured outputs? In this paper, we present a framework for constructing zero- and few-shot linguistic structure predictors. Our key insight is that we can use structural constraints -- and combinatorial inference derived from them -- to filter out inconsistent structures predicted by large language models. We instantiated this framework on two structured prediction tasks, and five datasets. Across all cases, our results show that enforcing consistency not only constructs structurally valid outputs, but also improves performance over the unconstrained variants.

Promptly Predicting Structures: The Return of Inference

TL;DR

and frame structure as a constrained optimization problem

subject to structural validity, solvable via methods like ILP or shortest-path searches. They instantiate the framework on Semantic Role Labeling and Coreference Resolution across five datasets, showing that enforcing consistency not only yields valid outputs but also improves task performance versus unconstrained prompts. The results demonstrate that constraint-driven inference reduces output invalidity, enhances cross-task robustness, and can compensate for smaller model sizes, with notable gains from iterative prompting and few-shot setups. The approach offers a practical pathway to reliable structured predictions with minimal labeled data, improving applicability of LLMs to linguistically structured tasks in diverse domains.

Abstract

Paper Structure (39 sections, 4 equations, 13 figures, 12 tables)

This paper contains 39 sections, 4 equations, 13 figures, 12 tables.

Introduction
Related Work
Prompts for Predictions.
Structures & Prompts.
"Promptly" Predicting Structures
Problem Statement and Notation
Unary Potentials from Prompts.
The Return of Inference.
The Framework.
Experiments
Semantic Role Labeling
Datasets.
Constraints.
Prompts and Inference.
Evaluation.
...and 24 more sections

Figures (13)

Figure 1: Example of Question Answer driven Semantic Role Labeling (QA-SRL) (a) without, and (b) with structured inference. Sans inference, prediction for each question may contain overlapping/repeated answers, which is prohibited per the task definition. Structured inference avoids such structurally invalid outputs.
Figure 2: An example graph for the statement "Elrond gave Aragorn the sword". This is a toy example with the top-2 candidate spans ($n=2$) for semantic roles=$\{a,b,c\}$. Each edge represents a candidate span. For instance, the span ranked second for scene role 'a' is the phrase "Elrond gave" with an edge score of $s_{2,a}$.
Figure 3: An example from ECB+ dataset where inference helps correct inconsistency of predictions. An incorrect link is predicted between 'Blackberry Curve 8900' and 'T-Mobile' by the unconstrained model, which is removed post-inference. Irrelevant mentions are hidden for clarity.
Figure 4: Head$_s$ performance and inconsistency percent over model sizes for the QA-SRL 2.0 dataset. The circle-marked (-$\mathbin{\vcenter{\hbox{$\bullet$}}}$-) plots map to the left axis, and the cross-marked (-x-) plot to the right axis. Inconsistency is shown for unconstrained models since constrained models are always consistent (i.e., $\rho$=0).
Figure 5: F1 performance and inconsistency percent over model sizes for the OntoNotes dataset. The visual encodings follow the same convention as in Figure \ref{['fig:qasrl2_model_size']}.
...and 8 more figures

Promptly Predicting Structures: The Return of Inference

TL;DR

Abstract

Promptly Predicting Structures: The Return of Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (13)