Conformal Prediction for Compositional Data
Lucas P. Amaral, Luben M. C. Cabezas, Thiago R. Ramos, Gustavo H. G. A. Pereira
TL;DR
This work extends conformal prediction to compositional data by embedding it in Dirichlet regression on the simplex and proposing two complementary predictive constructions: (i) a SCP based on quantile residuals for marginal Beta components, and (ii) a density-based HDR approach that is approximated by a coordinate-floor envelope and optionally refined with an internal grid. The methods guarantee finite-sample marginal coverage under exchangeability while honoring the simplex geometry, and are model-agnostic at the conformal layer. Through extensive simulations and an application to BudgetItaly, the HDR-floor-grid and quantile-residual methods achieve near-nominal coverage with substantially narrower predictive regions than naive HDR or unconstrained approaches, demonstrating calibrated and efficient uncertainty quantification for compositional prediction tasks. The results guide practitioners on when to choose fast HDR-floor versus sharper HDR-floor-grid or Quantile Residuals, enabling practical uncertainty quantification for compositional outcomes in economics, ecology, and beyond.
Abstract
In this work, we propose a set of conformal prediction procedures tailored to compositional responses, where outcomes are proportions that must be positive and sum to one. Building on Dirichlet regression, we introduce a split conformal approach based on quantile residuals and a highest-density region strategy that combines a fast coordinate-floor approximation with an internal grid refinement to restore sharpness. Both constructions are model-agnostic at the conformal layer and guarantee finite-sample marginal coverage under exchangeability, while respecting the geometry of the simplex. A comprehensive Monte Carlo study spanning homoscedastic and heteroscedastic designs shows that the quantile residual and grid-refined HDR methods achieve empirical coverage close to the nominal 90\% level and produce substantially narrower regions than the coordinate-floor approximation, which tends to be conservative. We further demonstrate the methods on household budget shares from the BudgetItaly dataset, using standardized socioeconomic and price covariates with a train, calibration, and test split. In this application, the grid-refined HDR attains coverage closest to the target with the smallest average widths, closely followed by the quantile residual approach, while the simple triangular HDR yields wider, less informative sets. Overall, the results indicate that conformal prediction on the simplex can be both calibrated and efficient, providing practical uncertainty quantification for compositional prediction tasks.
