Table of Contents
Fetching ...

Multivariate Uncertainty Quantification with Tomographic Quantile Forests

Takuya Kanazawa

TL;DR

Tomographic Quantile Forests (TQF) introduce a nonparametric, tree-based framework for multivariate uncertainty quantification by learning directional quantiles of y via projections, and reconstructing the joint conditional distribution p(y|x) through a tomography-inspired, sliced-Wasserstein objective. The method combines QRF++ as a backbone to efficiently model many quantile levels across directions, with a Quantile-Matching Empirical Measure (QMEM) that builds a weighted point cloud representing the conditional distribution. Across synthetic and real-world benchmarks, TQF demonstrates competitive distributional accuracy, flexibility in capturing multimodality and nonconvex supports, and favorable performance compared to parametric and other nonparametric baselines, including DRF, especially in low-data regimes. The work showcases a practical, scalable alternative for reliable uncertainty quantification on tabular data, with potential extensions to spatiotemporal settings and further robustness improvements through hyperparameter tuning. Overall, TQF advances non-neural distributional prediction by harnessing directional quantiles, Radon-transform-inspired tomography, and ensemble refinement to deliver rich multivariate uncertainty estimates.

Abstract

Quantifying predictive uncertainty is essential for safe and trustworthy real-world AI deployment. Yet, fully nonparametric estimation of conditional distributions remains challenging for multivariate targets. We propose Tomographic Quantile Forests (TQF), a nonparametric, uncertainty-aware, tree-based regression model for multivariate targets. TQF learns conditional quantiles of directional projections $\mathbf{n}^{\top}\mathbf{y}$ as functions of the input $\mathbf{x}$ and the unit direction $\mathbf{n}$. At inference, it aggregates quantiles across many directions and reconstructs the multivariate conditional distribution by minimizing the sliced Wasserstein distance via an efficient alternating scheme with convex subproblems. Unlike classical directional-quantile approaches that typically produce only convex quantile regions and require training separate models for different directions, TQF covers all directions with a single model without imposing convexity restrictions. We evaluate TQF on synthetic and real-world datasets, and release the source code on GitHub.

Multivariate Uncertainty Quantification with Tomographic Quantile Forests

TL;DR

Tomographic Quantile Forests (TQF) introduce a nonparametric, tree-based framework for multivariate uncertainty quantification by learning directional quantiles of y via projections, and reconstructing the joint conditional distribution p(y|x) through a tomography-inspired, sliced-Wasserstein objective. The method combines QRF++ as a backbone to efficiently model many quantile levels across directions, with a Quantile-Matching Empirical Measure (QMEM) that builds a weighted point cloud representing the conditional distribution. Across synthetic and real-world benchmarks, TQF demonstrates competitive distributional accuracy, flexibility in capturing multimodality and nonconvex supports, and favorable performance compared to parametric and other nonparametric baselines, including DRF, especially in low-data regimes. The work showcases a practical, scalable alternative for reliable uncertainty quantification on tabular data, with potential extensions to spatiotemporal settings and further robustness improvements through hyperparameter tuning. Overall, TQF advances non-neural distributional prediction by harnessing directional quantiles, Radon-transform-inspired tomography, and ensemble refinement to deliver rich multivariate uncertainty estimates.

Abstract

Quantifying predictive uncertainty is essential for safe and trustworthy real-world AI deployment. Yet, fully nonparametric estimation of conditional distributions remains challenging for multivariate targets. We propose Tomographic Quantile Forests (TQF), a nonparametric, uncertainty-aware, tree-based regression model for multivariate targets. TQF learns conditional quantiles of directional projections as functions of the input and the unit direction . At inference, it aggregates quantiles across many directions and reconstructs the multivariate conditional distribution by minimizing the sliced Wasserstein distance via an efficient alternating scheme with convex subproblems. Unlike classical directional-quantile approaches that typically produce only convex quantile regions and require training separate models for different directions, TQF covers all directions with a single model without imposing convexity restrictions. We evaluate TQF on synthetic and real-world datasets, and release the source code on GitHub.

Paper Structure

This paper contains 44 sections, 20 equations, 24 figures, 5 tables, 3 algorithms.

Figures (24)

  • Figure 1: Toy datasets in $\mathbb{R}^2$ containing 300 points with identical marginal distributions.
  • Figure 2: Illustration of the predicted quantiles from QRF (left panels) and QRF++ (right panels) on four synthetic datasets. The plotted quantile levels are (a) 0.10 and 0.90, (b) 0.30 and 0.70, (c) 0.20 and 0.80, and (d) 0.25 and 0.65. Red dashed lines show the ground truth; solid lines (magenta and green) show the model predictions. Gray points indicate the observed samples.
  • Figure 3: Target importance of QRF++ on the synthetic datasets; error bars show one standard deviation across 100 trees. The importances are normalized to sum to 1.
  • Figure 4: Numerical experiment of the QMEM algorithm. (a) The "two moons" dataset. (b) Best fit of 9 points to the data. (c) 150 points randomly sampled from the KDE on the support points in (b), with optimized weights; marker color indicates weight. (d) After a few iterations, the distribution gradually converges toward the true distribution. (e) Final point cloud obtained by pooling 20 clouds. After pruning, the population size is reduced from 3,000 to 2,456. The score "ED" in (b)--(e) stands for the Energy Distance measuring the discrepancy between the true and estimated distributions.
  • Figure 5: Reconstruction accuracy (ED score) for varying $K$ and $M$. Each score is the average over 3 trials with different random seeds.
  • ...and 19 more figures