Table of Contents
Fetching ...

Regression-aware Inference with LLMs

Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

TL;DR

This work builds on prior work on Minimum Bayes Risk decoding, and proposes alternate inference strategies that estimate the Bayes-optimal solution for regression and scoring metrics in closed-form from sampled responses.

Abstract

Large language models (LLMs) have shown strong results on a range of applications, including regression and scoring tasks. Typically, one obtains outputs from an LLM via autoregressive sampling from the model's output distribution. We show that this inference strategy can be sub-optimal for common regression and scoring evaluation metrics. As a remedy, we build on prior work on Minimum Bayes Risk decoding, and propose alternate inference strategies that estimate the Bayes-optimal solution for regression and scoring metrics in closed-form from sampled responses. We show that our proposal significantly improves over baselines across datasets and models.

Regression-aware Inference with LLMs

TL;DR

This work builds on prior work on Minimum Bayes Risk decoding, and proposes alternate inference strategies that estimate the Bayes-optimal solution for regression and scoring metrics in closed-form from sampled responses.

Abstract

Large language models (LLMs) have shown strong results on a range of applications, including regression and scoring tasks. Typically, one obtains outputs from an LLM via autoregressive sampling from the model's output distribution. We show that this inference strategy can be sub-optimal for common regression and scoring evaluation metrics. As a remedy, we build on prior work on Minimum Bayes Risk decoding, and propose alternate inference strategies that estimate the Bayes-optimal solution for regression and scoring metrics in closed-form from sampled responses. We show that our proposal significantly improves over baselines across datasets and models.
Paper Structure (21 sections, 9 equations, 2 figures, 11 tables)

This paper contains 21 sections, 9 equations, 2 figures, 11 tables.

Figures (2)

  • Figure 1: Illustration of metric-aware LLM inference for regression and scoring tasks. An input $x$ is passed to the LLM, and samples are drawn from the distribution over targets $y$ conditioned on $x$. These are then used to find the target optimizing a metric ${m}$ through a closed-form decision rule $\Phi$ (e.g., mean or median); Table \ref{['tbl:decision_rules']} presents specific solutions across metrics.
  • Figure 2: Examples from the Amazon dataset and the corresponding: human annotations and samples from the model. We find that in many cases, taking into account the model distribution (i.e. a mean of the distribution) allows for a prediction closer to the annotation than simply taking the mode of the distribution.