Table of Contents
Fetching ...

Bayesian Semi-structured Subspace Inference

Daniel Dold, David Rügamer, Beate Sick, Oliver Dürr

TL;DR

This work targets uncertainty quantification in semi-structured regression (SSR) models that combine interpretable structured effects with flexible unstructured neural network components. It introduces Bayesian semi-structured subspace inference, which samples the structured parameter in full space while constraining the DNN weights to a low-dimensional affine subspace defined by a Bézier curve, enabling efficient exploration of multiple loss valleys and epistemic uncertainty. Across toy, simulated, UCI, and melanoma datasets, the method yields posterior distributions for structured parameters close to full-space MCMC and competitive predictive performance, with calibration and uncertainty improving as the subspace dimension grows. The approach addresses optimization asymmetry in SSR and offers a practical, scalable framework for principled uncertainty quantification in hybrid models, with implications for medical decision support and beyond.

Abstract

Semi-structured regression models enable the joint modeling of interpretable structured and complex unstructured feature effects. The structured model part is inspired by statistical models and can be used to infer the input-output relationship for features of particular importance. The complex unstructured part defines an arbitrary deep neural network and thereby provides enough flexibility to achieve competitive prediction performance. While these models can also account for aleatoric uncertainty, there is still a lack of work on accounting for epistemic uncertainty. In this paper, we address this problem by presenting a Bayesian approximation for semi-structured regression models using subspace inference. To this end, we extend subspace inference for joint posterior sampling from a full parameter space for structured effects and a subspace for unstructured effects. Apart from this hybrid sampling scheme, our method allows for tunable complexity of the subspace and can capture multiple minima in the loss landscape. Numerical experiments validate our approach's efficacy in recovering structured effect parameter posteriors in semi-structured models and approaching the full-space posterior distribution of MCMC for increasing subspace dimension. Further, our approach exhibits competitive predictive performance across simulated and real-world datasets.

Bayesian Semi-structured Subspace Inference

TL;DR

This work targets uncertainty quantification in semi-structured regression (SSR) models that combine interpretable structured effects with flexible unstructured neural network components. It introduces Bayesian semi-structured subspace inference, which samples the structured parameter in full space while constraining the DNN weights to a low-dimensional affine subspace defined by a Bézier curve, enabling efficient exploration of multiple loss valleys and epistemic uncertainty. Across toy, simulated, UCI, and melanoma datasets, the method yields posterior distributions for structured parameters close to full-space MCMC and competitive predictive performance, with calibration and uncertainty improving as the subspace dimension grows. The approach addresses optimization asymmetry in SSR and offers a practical, scalable framework for principled uncertainty quantification in hybrid models, with implications for medical decision support and beyond.

Abstract

Semi-structured regression models enable the joint modeling of interpretable structured and complex unstructured feature effects. The structured model part is inspired by statistical models and can be used to infer the input-output relationship for features of particular importance. The complex unstructured part defines an arbitrary deep neural network and thereby provides enough flexibility to achieve competitive prediction performance. While these models can also account for aleatoric uncertainty, there is still a lack of work on accounting for epistemic uncertainty. In this paper, we address this problem by presenting a Bayesian approximation for semi-structured regression models using subspace inference. To this end, we extend subspace inference for joint posterior sampling from a full parameter space for structured effects and a subspace for unstructured effects. Apart from this hybrid sampling scheme, our method allows for tunable complexity of the subspace and can capture multiple minima in the loss landscape. Numerical experiments validate our approach's efficacy in recovering structured effect parameter posteriors in semi-structured models and approaching the full-space posterior distribution of MCMC for increasing subspace dimension. Further, our approach exhibits competitive predictive performance across simulated and real-world datasets.
Paper Structure (30 sections, 12 equations, 17 figures, 3 tables, 1 algorithm)

This paper contains 30 sections, 12 equations, 17 figures, 3 tables, 1 algorithm.

Figures (17)

  • Figure 1: Comparison of semi-structured subspace inference and hmc for an SSR model. The SSR is defined as a combination of a linear shift induced by the categorical feature $x$ (color code) and a non-linear trend in $u$ (x-axis) modeled by a deep neural network (cf. Equation \ref{['alg:1']}). Left/center: posterior predictive for dataset $\mathcal{D}$ and outcome $y$ with a 2-dim. and 12-dim. subspace; right: posterior predictive of hmc without any approximation. Points represent the data, colored by their category of $x$, the solid line is the mean, and shading depicts the 95% Highest Density Interval.
  • Figure 2: Bézier curve (magenta) in three-dimensional weight space, controlled by optimized points $\mathbf{p}_0^*, \mathbf{p}_1^*, \mathbf{p}_2^*$, which form a two-dimensional subspace $\text{AffSpan}(\mathbf{p}_0^*, \mathbf{p}_1^*, \mathbf{p}_2^*)$ indicated by the cyan triangle that includes the Bézier curve. The difference vectors $\mathbf{p}_1^* - \mathbf{p}_0^*$ and $\mathbf{p}_2^* - \mathbf{p}_0^*$ spanning the affine subspace are shown in green.
  • Figure 3: Posterior of the parameters in the structured model part using the naïve subspace approximation with $k=4$ (Naïve-Subspace), our approach with $k=2$ and $p=2$ (Semi-Subspace), and hmc running in the full parameter space. The top and the right plot shows the marginal posterior distribution, whereas the center plot visualizes the bivariate distribution using a kernel density estimator based on 4000 samples from 10 hmc chains.
  • Figure 4: Posterior mean (top) and standard deviation (bottom) of our approximation method compared to the gold standard hmc. The boxplots show differences between the learned distribution's mean/standard deviation of our approach minus the respective statistic using hmc for the 50 simulation repetitions. The x-axis depicts the different subspace dimensions $k$ used in our approach and each color represents one of the three parameters in $\theta$.
  • Figure 5: Coverage comparison of credibility intervals derived from the posterior $p(\theta_1|\mathcal{D})$ using different subspace dimensions $k$ (colors). The theoretical coverage (x-axis) across different values in $(0,1)$ is plotted against the sample coverage (y-axis), based on the empirical ratio of the credibility interval containing the true parameter. Whiskers represent the 95% Wilson confidence interval.
  • ...and 12 more figures