Table of Contents
Fetching ...

OmniPred: Language Models as Universal Regressors

Xingyou Song, Oscar Li, Chansoo Lee, Bangding Yang, Daiyi Peng, Sagi Perel, Yutian Chen

TL;DR

OmniPred reframes regression as a universal, text-token-based task solvable by a single, 200M-parameter T5 model trained on large-scale, heterogeneous Vizier data. By representing inputs $x$ and outputs $y$ in free-form textual tokens and training across multiple tasks, it achieves high-precision numeric predictions and meaningful uncertainty estimates, with transfer benefits evident both in unseen tasks and during online finetuning. The work demonstrates that multi-task training can outperform traditional, task-specific regressors and identifies key factors—sampling, tokenization, and data regime—that influence performance. This approach offers a scalable path toward end-to-end regression across diverse domains, enabling faster surrogate modeling and experimental design without heavy feature engineering.

Abstract

Regression is a powerful tool to accurately predict the outcome metric of a system given a set of parameters, but has traditionally been restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over $(x,y)$ data from arbitrary formats. Using data sourced from Google Vizier, one of the largest proprietary blackbox optimization databases in the world, our extensive experiments demonstrate that language models are capable of very precise numerical regression using only textual representations of mathematical parameters and values, and if given the opportunity to train at scale over multiple tasks, can significantly outperform traditional regression models.

OmniPred: Language Models as Universal Regressors

TL;DR

OmniPred reframes regression as a universal, text-token-based task solvable by a single, 200M-parameter T5 model trained on large-scale, heterogeneous Vizier data. By representing inputs and outputs in free-form textual tokens and training across multiple tasks, it achieves high-precision numeric predictions and meaningful uncertainty estimates, with transfer benefits evident both in unseen tasks and during online finetuning. The work demonstrates that multi-task training can outperform traditional, task-specific regressors and identifies key factors—sampling, tokenization, and data regime—that influence performance. This approach offers a scalable path toward end-to-end regression across diverse domains, enabling faster surrogate modeling and experimental design without heavy feature engineering.

Abstract

Regression is a powerful tool to accurately predict the outcome metric of a system given a set of parameters, but has traditionally been restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over data from arbitrary formats. Using data sourced from Google Vizier, one of the largest proprietary blackbox optimization databases in the world, our extensive experiments demonstrate that language models are capable of very precise numerical regression using only textual representations of mathematical parameters and values, and if given the opportunity to train at scale over multiple tasks, can significantly outperform traditional regression models.
Paper Structure (35 sections, 2 equations, 10 figures, 9 tables)

This paper contains 35 sections, 2 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Overview of our method. Using heterogenous offline $(x,y)$ evaluation data collected from a variety of sources, we train a LM-based regressor.
  • Figure 2: Common example of a (possibly nested) space and suggestions $x$ in OSS Vizier.
  • Figure 3: Model prediction samples over selected 4D BBOB functions with unseen shifts. Empirical mode (bolded) and min/max are shown from 10 samples. Over all BBOB functions, we vary the coordinate value $x_{i}$ while keeping others $x_{j\neq i}$ fixed.
  • Figure 4: Left: Diagonal fit (/) is better. Model's $y$-prediction vs. ground truth over varying studies. Corporate-specific objective names are redacted. Right: Corresponding input spaces. "#-H, $-T" is shorthand for a conditional hybrid input space with # root parameters and $ total possible parameters.
  • Figure 5: Lower ($\downarrow$) is better. Mean study prediction error of the model when varying the amount of different studies used in training (log scale). Colored horizontal lines display single-task baseline errors.
  • ...and 5 more figures