Table of Contents
Fetching ...

Inductive Conformal Prediction under Data Scarcity: Exploring the Impacts of Nonconformity Measures

Yuko Kato, David M. J. Tax, Marco Loog

TL;DR

This work probes inductive conformal prediction (ICP) for regression under data scarcity, focusing on how three nonconformity measures—absolute error-based, normalized absolute error-based, and quantile-based—influence validity and efficiency across synthetic and real-world datasets with small sample sizes. It compares underlying models (neural networks, Gaussian processes, and quantile regression) and evaluates performance under varying noise structures, including homoscedastic and heteroscedastic settings. The key finding is that no single nonconformity measure dominates; performance is highly data-dependent, with quantile-based NCMs (qNCM) excelling in heteroscedastic non-Gaussian contexts and norm-based NCMs (normNCM) occasionally producing unstable PI widths at small sample sizes. The results underscore the need to tailor NCM choice to data characteristics and suggest that convergence to target validity occurs at substantial sample sizes (e.g., around 800 for some targets), informing practical deployment of ICP in real-world, data-limited scenarios.

Abstract

Conformal prediction, which makes no distributional assumptions about the data, has emerged as a powerful and reliable approach to uncertainty quantification in practical applications. The nonconformity measure used in conformal prediction quantifies how a test sample differs from the training data and the effectiveness of a conformal prediction interval may depend heavily on the precise measure employed. The impact of this choice has, however, not been widely explored, especially when dealing with limited amounts of data. The primary objective of this study is to evaluate the performance of various nonconformity measures (absolute error-based, normalized absolute error-based, and quantile-based measures) in terms of validity and efficiency when used in inductive conformal prediction. The focus is on small datasets, which is still a common setting in many real-world applications. Using synthetic and real-world data, we assess how different characteristics -- such as dataset size, noise, and dimensionality -- can affect the efficiency of conformal prediction intervals. Our results show that although there are differences, no single nonconformity measure consistently outperforms the others, as the effectiveness of each nonconformity measure is heavily influenced by the specific nature of the data. Additionally, we found that increasing dataset size does not always improve efficiency, suggesting the importance of fine-tuning models and, again, the need to carefully select the nonconformity measure for different applications.

Inductive Conformal Prediction under Data Scarcity: Exploring the Impacts of Nonconformity Measures

TL;DR

This work probes inductive conformal prediction (ICP) for regression under data scarcity, focusing on how three nonconformity measures—absolute error-based, normalized absolute error-based, and quantile-based—influence validity and efficiency across synthetic and real-world datasets with small sample sizes. It compares underlying models (neural networks, Gaussian processes, and quantile regression) and evaluates performance under varying noise structures, including homoscedastic and heteroscedastic settings. The key finding is that no single nonconformity measure dominates; performance is highly data-dependent, with quantile-based NCMs (qNCM) excelling in heteroscedastic non-Gaussian contexts and norm-based NCMs (normNCM) occasionally producing unstable PI widths at small sample sizes. The results underscore the need to tailor NCM choice to data characteristics and suggest that convergence to target validity occurs at substantial sample sizes (e.g., around 800 for some targets), informing practical deployment of ICP in real-world, data-limited scenarios.

Abstract

Conformal prediction, which makes no distributional assumptions about the data, has emerged as a powerful and reliable approach to uncertainty quantification in practical applications. The nonconformity measure used in conformal prediction quantifies how a test sample differs from the training data and the effectiveness of a conformal prediction interval may depend heavily on the precise measure employed. The impact of this choice has, however, not been widely explored, especially when dealing with limited amounts of data. The primary objective of this study is to evaluate the performance of various nonconformity measures (absolute error-based, normalized absolute error-based, and quantile-based measures) in terms of validity and efficiency when used in inductive conformal prediction. The focus is on small datasets, which is still a common setting in many real-world applications. Using synthetic and real-world data, we assess how different characteristics -- such as dataset size, noise, and dimensionality -- can affect the efficiency of conformal prediction intervals. Our results show that although there are differences, no single nonconformity measure consistently outperforms the others, as the effectiveness of each nonconformity measure is heavily influenced by the specific nature of the data. Additionally, we found that increasing dataset size does not always improve efficiency, suggesting the importance of fine-tuning models and, again, the need to carefully select the nonconformity measure for different applications.

Paper Structure

This paper contains 22 sections, 11 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: One dimensional synthetic data with different noise distribution (a) Homoscedastic Gaussian noise, (b) Heteroscedastic Gaussian noise, (c) Heteroscedastic non-Gaussian noise and (d) Homoscedastic right-skewed noise
  • Figure 2: Trade-off between validity and normalized efficiency. In this figure, the x-axis represents validity, while the y-axis shows efficiency normalized by target validity, both plotted on a logarithmic scale. The data correspond to the one-dimensional case using homoscedastic Gaussian noise (left column) and heteroscedastic Gaussian noise (middle column) as well as right-skewed noise (right column), across varying target coverage rates and data sizes. Different shapes represent target coverage rates: triangles for 80%, squares for 90%, diamonds for 95%, and circles for 99%. The sample sizes are indicated by the size of the shapes: small for 100, medium for 500, and large for 1000. Each row corresponds to a different NCM: starting from the top, the rows correspond to qNCM, NCM-NN, normNCM-NN, NCM-GP, and normNCM-GP.
  • Figure 3: Mean absolute difference between empirical and target coverage rates (90%) for different sample sizes under heteroscedastic noise conditions for different NCMs. Error bars reflect 1.96 times the SE, corresponding to 95% confidence intervals.
  • Figure 4: Efficiency for varying data sizes under (a) heteroscedastic Gaussian and (b) heteroscedastic Non-Gaussian noise conditions. The plot shows the performance of different NCMs (qNCM, NCM-NN, normNCM-NN, NCM-GP and normNCM-GP) for an 80% target coverage rate, with error bars reflecting 1.96 times the SE, corresponding to 95% confidence intervals.
  • Figure 5: Trade-off between validity and normalized efficiency. In this figure, the x-axis represents validity, while the y-axis shows efficiency normalized by target validity, both plotted on a logarithmic scale. The data correspond to the one-dimensional case using non-Gaussian heteroscedastic noise across varying target coverage rates and data sizes. Similar to Figure \ref{['fig:1d_combined']}, different shapes and sizes represent different target coverage rates and sample sizes. The different NCMs used are: a) qNCM, b) NCM-NN, c) normNCM-NN, d) NCM-GP, and e) normNCM-GP.
  • ...and 4 more figures