Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

Thomas P. Zollo; Todd Morrill; Zhun Deng; Jake C. Snell; Toniann Pitassi; Richard Zemel

Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

Thomas P. Zollo, Todd Morrill, Zhun Deng, Jake C. Snell, Toniann Pitassi, Richard Zemel

TL;DR

Experiments on applications such as open-ended chat, medical question summarization, and code generation highlight how a lightweight framework for selecting a prompt based on rigorous upper bounds on families of informative risk measures can foster responsible deployment by reducing the risk of the worst outcomes.

Abstract

The recent explosion in the capabilities of large language models has led to a wave of interest in how best to prompt a model to perform a given task. While it may be tempting to simply choose a prompt based on average performance on a validation set, this can lead to a deployment where unexpectedly poor responses are generated, especially for the worst-off users. To mitigate this prospect, we propose Prompt Risk Control, a lightweight framework for selecting a prompt based on rigorous upper bounds on families of informative risk measures. We offer methods for producing bounds on a diverse set of metrics, including quantities that measure worst-case responses and disparities in generation quality across the population of users. In addition, we extend the underlying statistical bounding techniques to accommodate the possibility of distribution shifts in deployment. Experiments on applications such as open-ended chat, medical question summarization, and code generation highlight how such a framework can foster responsible deployment by reducing the risk of the worst outcomes.

Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

TL;DR

Abstract

Paper Structure (32 sections, 1 theorem, 20 equations, 10 figures, 3 tables)

This paper contains 32 sections, 1 theorem, 20 equations, 10 figures, 3 tables.

Introduction
Background
Prompt Risk Control
Bounding the Mean: Learn Then Test (LTT)
Controlling Quantile Risk
Quantile-Based Risk Measures
Quantile Risk Control
Controlling Measures of Societal Dispersion
Extending Bounds for Distribution Shifts
Setup
Algorithm Outline
Experiments
Bounding Expected Loss in Code Generation
Bounding Worst-Case Toxicity in Chatbot Applications
Addressing Adversarial Distributions via RedTeaming
...and 17 more sections

Key Result

lemma 1

Suppose $w^*(\cdot)\in[\underline{w}(\cdot),\bar{w}(\cdot)]$ and for $\epsilon = \max_{i}|\bar{w}(x_i)-\underline{w}(x_i)|$, we have $\epsilon<1$; if we further have an increasing lower bound function $F^L_{\tilde{S}}$ such that where $F_{\tilde{\mathcal{D}}}$ is the CDF of the distribution induced by the loss over samples drawn from $\tilde{D}$, then is an increasing lower bound function for $F

Figures (10)

Figure 1: Prompt Risk Control (PRC) assists in choosing a prompt (or set of prompts) that will, with high likelihood, not incur too high of a loss according to some chosen risk measure and threshold. Here we illustrate PRC being used to select a system prompt to be appended to input queries to a chatbot, a popular setup in modern LLM deployments (algorithm inputs are in grey). The goal is to ensure that the responses will not be too toxic for the highest-loss (most toxic) portion of the data distribution (e.g., under the CVaR risk measure). The algorithm returns a set of prompts that bound the risk at an acceptable level, from which a user can select a safe prompt for deployment.
Figure 2: Examples of the risk function $R$. Left: Value at risk (VaR) measures the loss value at some specified quantile of the loss distribution $\beta$. Middle: Conditional value at risk (CVaR) measures the average loss for the worst-off portion of the population starting with some specified quantile of the loss distribution $\beta$. Right: The Lorenz Curve shows the cumulative share of the population loss incurred by the $\beta$ proportion of the population with lowest loss. Under perfect equality, the first $\beta$ proportion of the population would incur $\beta$ proportion of the loss for all $\beta$. The Gini coefficient is calculated as $\frac{A}{A+B}$ for the areas $A$ (between the line of equality and Lorenz Curve) and $B$ (below the Lorenz Curve).
Figure 3: For a set of candidate prompts $P$, Prompt Risk Control returns a set of prompts $\hat{P} \subset P$ that, when combined with large language model $G$, will not exceed a given risk threshold $\alpha$ with probability at least $1-\delta$. The risk $R$ is a measure such as mean, VaR, or Gini coefficient, which gives some aggregate notion of the instance-wise loss $l$ (for example toxicity score or ROUGE), and it is upper bounded by $\hat{R}(G_p, l)$. Here, the set of prompts $\hat{P} = \{p_6, p_8, p_9\}$ yield acceptable upper bounds on $R$. From these, one could choose to deploy the prompt with the best bound, or else the best prompt in $\hat{P}$ according to some other performance metric.
Figure 4: Each candidate prompt is applied to produce LLM output on the validation set. This output is scored according to some user-chosen loss function. The loss values for each prompt are fed to Prompt Risk Control, along with a user-chosen risk measure and threshold, in order to return the set of prompts that control the risk at an acceptable level.
Figure 5: The quantile function ($Q$) of the loss distribution induced by a prompt is upper bounded by $B^U_Q$, which can be post-processed to control a rich family of risk measures such as value at risk (VaR) and conditional value at risk (CVaR). VaR (middle) considers the loss for one example at a specific quantile. CVaR (right) considers the average loss value in the interval starting at a specific quantile and ending at 1, for example the average loss for the worst-off 15% of the population.
...and 5 more figures

Theorems & Definitions (4)

definition 1: Risk-Controlling Prompt Set
definition 2: Quantile-based Risk Measure
lemma 1
proof

Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

TL;DR

Abstract

Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (4)