Table of Contents
Fetching ...

Engression: Extrapolation through the Lens of Distributional Regression

Xinwei Shen, Nicolai Meinshausen

TL;DR

Engression introduces a neural-network-based distributional regression framework that directly models the full conditional distribution $Y|X=x$ via a generative transform, enabling sampling and high-dimensional outcome handling. By combining distributional regression with pre-additive noise models (pre-ANMs), engression provides a new approach to extrapolation for nonlinear relationships, with theory showing distributional extrapolability under mild monotonicity and noise assumptions. Finite-sample analyses establish consistency and error rates outside the training support in well-specified settings, while simulations and extensive real-data experiments demonstrate robust extrapolation advantages over traditional L1/L2 regression and quantile-based methods. The method yields accurate prediction intervals and distributional predictions beyond the training support, offering a practical, scalable tool for tasks requiring reliable extrapolation and uncertainty quantification in nonlinear regimes.

Abstract

Distributional regression aims to estimate the full conditional distribution of a target variable, given covariates. Popular methods include linear and tree-ensemble based quantile regression. We propose a neural network-based distributional regression methodology called `engression'. An engression model is generative in the sense that we can sample from the fitted conditional distribution and is also suitable for high-dimensional outcomes. Furthermore, we find that modelling the conditional distribution on training data can constrain the fitted function outside of the training support, which offers a new perspective to the challenging extrapolation problem in nonlinear regression. In particular, for `pre-additive noise' models, where noise is added to the covariates before applying a nonlinear transformation, we show that engression can successfully perform extrapolation under some assumptions such as monotonicity, whereas traditional regression approaches such as least-squares or quantile regression fall short under the same assumptions. Our empirical results, from both simulated and real data, validate the effectiveness of the engression method and indicate that the pre-additive noise model is typically suitable for many real-world scenarios. The software implementations of engression are available in both R and Python.

Engression: Extrapolation through the Lens of Distributional Regression

TL;DR

Engression introduces a neural-network-based distributional regression framework that directly models the full conditional distribution via a generative transform, enabling sampling and high-dimensional outcome handling. By combining distributional regression with pre-additive noise models (pre-ANMs), engression provides a new approach to extrapolation for nonlinear relationships, with theory showing distributional extrapolability under mild monotonicity and noise assumptions. Finite-sample analyses establish consistency and error rates outside the training support in well-specified settings, while simulations and extensive real-data experiments demonstrate robust extrapolation advantages over traditional L1/L2 regression and quantile-based methods. The method yields accurate prediction intervals and distributional predictions beyond the training support, offering a practical, scalable tool for tasks requiring reliable extrapolation and uncertainty quantification in nonlinear regimes.

Abstract

Distributional regression aims to estimate the full conditional distribution of a target variable, given covariates. Popular methods include linear and tree-ensemble based quantile regression. We propose a neural network-based distributional regression methodology called `engression'. An engression model is generative in the sense that we can sample from the fitted conditional distribution and is also suitable for high-dimensional outcomes. Furthermore, we find that modelling the conditional distribution on training data can constrain the fitted function outside of the training support, which offers a new perspective to the challenging extrapolation problem in nonlinear regression. In particular, for `pre-additive noise' models, where noise is added to the covariates before applying a nonlinear transformation, we show that engression can successfully perform extrapolation under some assumptions such as monotonicity, whereas traditional regression approaches such as least-squares or quantile regression fall short under the same assumptions. Our empirical results, from both simulated and real data, validate the effectiveness of the engression method and indicate that the pre-additive noise model is typically suitable for many real-world scenarios. The software implementations of engression are available in both R and Python.
Paper Structure (55 sections, 16 theorems, 161 equations, 18 figures, 3 tables, 1 algorithm)

This paper contains 55 sections, 16 theorems, 161 equations, 18 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

For any distribution $P'$, we have $\mathbb{E}_{Z\sim P}[\mathrm{ES}(P,Z)] \ge \mathbb{E}_{Z\sim P}[\mathrm{ES}(P',Z)]$, where the equality holds if and only if $P$ and $P'$ are identical.

Figures (18)

  • Figure 1: Examples of nonlinear extrapolation behaviour for different model classes. The true (conditional median) function is a cubic function. We use linear quantile regression, quantile regression forest, and quantile regression with a two-layer neural network and 100 hidden neurons to fit the training data supported on $[-2,2]$, and evaluate the models on a wider range of test data.
  • Figure 2: Out-of-support predictions by engression and $L_2$ regression on the air quality data. The models are trained on a support up to the first quartile of NMHC and evaluated in their predictions for larger NMHC values. To ensure a fair comparison, we apply both methods with the same neural network architecture whose number of layers is varying from 3 to 9; for each architecture, we repeatedly employ both methods for ten times (each with an independent random initialisation) and show the behaviour of all of them.
  • Figure 3: Extrapolation uncertainties and extrapolability gains. The gains for the mean and the distribution depend on the specific noise distribution and here we take $\eta\sim\mathrm{Unif}[-\eta_{\max},\eta_{\max}]$ as an example. Black lines represent extrapolation uncertainties; red lines are extrapolability gains; red arrows represent the maximum extrapolability gains.
  • Figure 4: The estimated conditional medians and means of different methods in all simulation settings. Each figure consists of the estimated functions for 20 random repetitions (whose 10% to 90% quantiles are plotted in graduated colors with the darkest curve in the middle representing the mean of the 20 estimated mean functions), the true function, and training data.
  • Figure 5: Average performance on out-of-support data. The top row shows the $L_1$ loss for conditional median estimation; the bottom row shows the $L_2$ loss for conditional mean estimation.
  • ...and 13 more figures

Theorems & Definitions (41)

  • Lemma 1
  • Proposition 1
  • Definition 1: Functional extrapolability
  • Example 1: Linear functions
  • Example 2: Lipschitz functions
  • Example 3: Monotone functions
  • Definition 2: Mean extrapolability
  • Definition 3: Distributional extrapolability
  • Definition 4: Linear and nonlinear pre-ANMs
  • Theorem 1
  • ...and 31 more