Table of Contents
Fetching ...

How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression

Lucas Kook, Chris Kolb, Philipp Schiele, Daniel Dold, Marcel Arpogaus, Cornelius Fritz, Philipp F. Baumann, Philipp Kopper, Tobias Pielok, Emilio Dorigatti, David Rügamer

TL;DR

This paper introduces DRIFT, a distributional regression framework based on inverse flow transformations that maps a simple base distribution to the conditional distribution of outcomes given features. By employing monotone neural networks to realize the inverse conditional flow and neural basis functions for interpretable predictors, DRIFT unifies parametric and nonparametric distributional regression within a single maximum-likelihood framework. The authors demonstrate that DRIFT can replicate and sometimes surpass classical methods across ordinal, time-series, survival, and multimodal tasks, while maintaining interpretability. The work suggests that neural normalizing-flow-based representations can serve as competitive substitutes for traditional distributional regression models, with potential for broad applicability and future inference methods tailored to DRIFT.

Abstract

Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.

How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression

TL;DR

This paper introduces DRIFT, a distributional regression framework based on inverse flow transformations that maps a simple base distribution to the conditional distribution of outcomes given features. By employing monotone neural networks to realize the inverse conditional flow and neural basis functions for interpretable predictors, DRIFT unifies parametric and nonparametric distributional regression within a single maximum-likelihood framework. The authors demonstrate that DRIFT can replicate and sometimes surpass classical methods across ordinal, time-series, survival, and multimodal tasks, while maintaining interpretability. The work suggests that neural normalizing-flow-based representations can serve as competitive substitutes for traditional distributional regression models, with potential for broad applicability and future inference methods tailored to DRIFT.

Abstract

Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.
Paper Structure (41 sections, 1 theorem, 14 equations, 7 figures, 5 tables)

This paper contains 41 sections, 1 theorem, 14 equations, 7 figures, 5 tables.

Key Result

Proposition 1

Consider an inverse conditional flow of the form where $\phi^{-}_{y\mathbf{x}}$, $\phi^{-}_{y}$, and $\varphi$ are feed-forward neural networks. For $\phi^{-}(y, \mathbf{x})$ to be strictly monotonically increasing in $y$, it is sufficient for $\phi^{-}_{y\mathbf{x}}$ and $\phi^{-}_{y}$ to have strictly positive weights and strictly monotonic acti

Figures (7)

  • Figure 1: Depiction of the location-scale DRIFT in Example \ref{['exmpl:drift']}. For three values of $X$ (dashed/solid/dotted), the standard logistic base distribution (top side) is transformed into the conditional outcome distribution (right side) via the conditional flows (middle). In this example, the distribution for $x=0$ is a normal mixture with equal weights (solid line).
  • Figure 2: Estimated partial effects for four features in a 20-fold cross-validation of the UCI wine quality dataset using a DRIFT and a proportional odds logistic regression (POLR) model.
  • Figure 3: Left: Estimated effects on the prevalence of Covid-19 using a GAM with neural or spline basis and DRIFT (colors). Right: Estimated spatial effects on the prevalence of Covid-19 from DRIFT and GAM (with spline basis).
  • Figure 4: Left and center: Estimated Gaussian densities over time (y-axis) for the two mixture components in the mixture model. Right: Estimated densities over time using DRIFT.
  • Figure 5: Left: Predictive performance in terms of integrated Brier score (lower is better, evaluated at the 25th, 50th, and 75th percentile) of a Kaplan-Meier estimator, a piece-wise exponential additive model (PAM) and a DRIFT. Middle: Estimated log cumulative hazards given daytime (in hours; colors). Right: Out-of-sample martingale residuals show comparable prediction errors for DRIFT and PAM.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Example 1: Assumptions on the base distribution
  • Example 2: Structural assumptions
  • Proposition 1: Monotonicity of the conditional flow