On the Expected Size of Conformal Prediction Sets
Guneet S. Dhillon, George Deligiannidis, Tom Rainforth
TL;DR
This paper tackles the finite-sample estimation of the expected size of split conformal prediction sets, a key practical metric alongside error control. It derives a theoretical quantification linking the expected set size to calibration-score distributions via $\mathbb{E}[|\hat{C}^{R}_{\alpha}(X_{n+1};Z_{1:n})|] = \int_{\mathcal{R}} \mathbb{P}\{\tau_{\alpha}(R_{1:n})\ge r\}\#_{R}(r)\,dr$, and introduces practical point and interval estimators that require only a single data collection. The methods handle both known and unknown multiplicative factors $\#_{R}$, using empirical calibration-score distributions and nested Monte Carlo to produce reliable estimates and high-probability bounds. Experiments on real-world regression and classification tasks demonstrate that these estimates closely track Monte Carlo baselines and provide informative intervals, enabling practitioners to assess expected set sizes without extensive data reuse or repeated conformal runs.
Abstract
While conformal predictors reap the benefits of rigorous statistical guarantees on their error frequency, the size of their corresponding prediction sets is critical to their practical utility. Unfortunately, there is currently a lack of finite-sample analysis and guarantees for their prediction set sizes. To address this shortfall, we theoretically quantify the expected size of the prediction sets under the split conformal prediction framework. As this precise formulation cannot usually be calculated directly, we further derive point estimates and high-probability interval bounds that can be empirically computed, providing a practical method for characterizing the expected set size. We corroborate the efficacy of our results with experiments on real-world datasets for both regression and classification problems.
