Omnipredictors for Regression and the Approximate Rank of Convex Functions

Parikshit Gopalan; Princewill Okoroafor; Prasad Raghavendra; Abhishek Shetty; Mihir Singhal

Omnipredictors for Regression and the Approximate Rank of Convex Functions

Parikshit Gopalan, Princewill Okoroafor, Prasad Raghavendra, Abhishek Shetty, Mihir Singhal

TL;DR

The paper advances omniprediction by extending it to regression with labels in $[0,1]$, introducing sufficient statistics that justify loss-minimization across a family of losses and linking these to the $ ilde{O}(1/oxed{inite})$-approximate rank of loss families. It proves a near-tight bound $O(1/oxed{inite}^{2/3})$ on the $oxed{inite}$-approximate dimension of convex Lipschitz losses on $[0,1]$, enabling substantially faster omnipredictors than prior CDF-based approaches. By generalizing loss outcome indistinguishability to regression and developing calibrated multiaccuracy algorithms, the authors create practical omnipredictors for convex Lipschitz losses, low-degree polynomial losses, and GLM losses under weak learnability. The work connects sufficient statistics to computational complexity via approximate rank, offering a framework that yields loss-agnostic predictors with strong performance guarantees across a broad spectrum of loss families. These results enhance robust forecasting in regression and provide foundational tools for loss-agnostic prediction under weak learning assumptions.

Abstract

Consider the supervised learning setting where the goal is to learn to predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An \textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class $\mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$. Since the work of [GKR+21] that introduced the notion, there has been a large body of work in the setting of binary labels where $\mathbf y \in \{0, 1\}$, but much less is known about the regression setting where $\mathbf y \in [0,1]$ can be continuous. Our main conceptual contribution is the notion of \textit{sufficient statistics} for loss minimization over a family of loss functions: these are a set of statistics about a distribution such that knowing them allows one to take actions that minimize the expected loss for any loss in the family. The notion of sufficient statistics relates directly to the approximate rank of the family of loss functions. Our key technical contribution is a bound of $O(1/\varepsilon^{2/3})$ on the $ε$-approximate rank of convex, Lipschitz functions on the interval $[0,1]$, which we show is tight up to a factor of $\mathrm{polylog} (1/ε)$. This yields improved runtimes for learning omnipredictors for the class of all convex, Lipschitz loss functions under weak learnability assumptions about the class $\mathcal C$. We also give efficient omnipredictors when the loss families have low-degree polynomial approximations, or arise from generalized linear models (GLMs). This translation from sufficient statistics to faster omnipredictors is made possible by lifting the technique of loss outcome indistinguishability introduced by [GKH+23] for Boolean labels to the regression setting.

Omnipredictors for Regression and the Approximate Rank of Convex Functions

TL;DR

The paper advances omniprediction by extending it to regression with labels in

, introducing sufficient statistics that justify loss-minimization across a family of losses and linking these to the

-approximate rank of loss families. It proves a near-tight bound

on the

-approximate dimension of convex Lipschitz losses on

, enabling substantially faster omnipredictors than prior CDF-based approaches. By generalizing loss outcome indistinguishability to regression and developing calibrated multiaccuracy algorithms, the authors create practical omnipredictors for convex Lipschitz losses, low-degree polynomial losses, and GLM losses under weak learnability. The work connects sufficient statistics to computational complexity via approximate rank, offering a framework that yields loss-agnostic predictors with strong performance guarantees across a broad spectrum of loss families. These results enhance robust forecasting in regression and provide foundational tools for loss-agnostic prediction under weak learning assumptions.

Abstract

Consider the supervised learning setting where the goal is to learn to predict labels

given points

from a distribution. An \textit{omnipredictor} for a class

of loss functions and a class

of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in

for every loss in

. Since the work of [GKR+21] that introduced the notion, there has been a large body of work in the setting of binary labels where

, but much less is known about the regression setting where

can be continuous. Our main conceptual contribution is the notion of \textit{sufficient statistics} for loss minimization over a family of loss functions: these are a set of statistics about a distribution such that knowing them allows one to take actions that minimize the expected loss for any loss in the family. The notion of sufficient statistics relates directly to the approximate rank of the family of loss functions. Our key technical contribution is a bound of

on the

-approximate rank of convex, Lipschitz functions on the interval

, which we show is tight up to a factor of

. This yields improved runtimes for learning omnipredictors for the class of all convex, Lipschitz loss functions under weak learnability assumptions about the class

. We also give efficient omnipredictors when the loss families have low-degree polynomial approximations, or arise from generalized linear models (GLMs). This translation from sufficient statistics to faster omnipredictors is made possible by lifting the technique of loss outcome indistinguishability introduced by [GKH+23] for Boolean labels to the regression setting.

Paper Structure (47 sections, 36 theorems, 118 equations, 4 algorithms)

This paper contains 47 sections, 36 theorems, 118 equations, 4 algorithms.

Introduction
Omniprediction.
Omniprediction in the Boolean setting
Omnipredictors for regression
Sufficient statistics:
Omniprediction from indistinguishability:
Approximate rank & sufficient statistics:
Approximate rank of convex Lipschitz functions:
Omnipredictors for loss families
Overview of technical contributions
Approximating univariate convex functions.
Reduction to discrete functions.
Reduction to the $\mathrm{ReLU}$ functions.
From $\mathrm{ReLU}$ to intervals.
Approximating intervals.
...and 32 more sections

Key Result

Theorem 2.2

For every $\delta > 0$, we have

Theorems & Definitions (71)

Definition 1.1
Definition 2.1: $\varepsilon$-approximate basis, $\varepsilon$-approximate dimension
Theorem 2.2
Lemma 2.3: Discretization
proof
Lemma 2.4
Lemma 2.5: Discrete Taylor series expansion
proof
Corollary 2.6
proof
...and 61 more

Omnipredictors for Regression and the Approximate Rank of Convex Functions

TL;DR

Abstract

Omnipredictors for Regression and the Approximate Rank of Convex Functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (71)