Conformal online model aggregation
Matteo Gasparin, Aaditya Ramdas
TL;DR
The paper tackles model selection in conformal prediction by proposing COMA, an online wrapper that aggregates multiple conformal prediction sets through data-dependent weights updated via AdaHedge. It establishes a 2α miscoverage guarantee under a negative-correlation assumption and provides regret bounds comparing performance to the best expert. COMA is extended to distribution-shift scenarios by coupling with adaptive conformal inference, with decentralized and centralized variants that maintain valid coverage while adapting to changing data. Empirical results in both iid and non-iid settings show that COMA often yields significantly smaller prediction sets without sacrificing coverage, making it highly suitable for distributed systems and drift-prone applications.
Abstract
Conformal prediction equips machine learning models with a reasonable notion of uncertainty quantification without making strong distributional assumptions. It wraps around any prediction model and converts point predictions into set predictions with a predefined marginal coverage guarantee. However, conformal prediction only works if we fix the underlying machine learning model in advance. A relatively unaddressed issue in conformal prediction is that of model selection and/or aggregation: given a set of prediction models, which one should we conformalize? This paper suggests that instead of performing model selection, it can be prudent and practical to perform conformal set aggregation in an online, adaptive fashion. We propose a wrapper that takes in several conformal prediction sets (themselves wrapped around black-box prediction models), and outputs a single adaptively-combined prediction set. Our method, called conformal online model aggregation (COMA), is based on combining the prediction sets from several algorithms by weighted voting, and can be thought of as a sort of online stacking of the underlying conformal sets. As long as the input sets have (distribution-free) coverage guarantees, COMA retains coverage guarantees, under a negative correlation assumption between errors and weights. We verify that the assumption holds empirically in all settings considered. COMA is well-suited for decentralized or distributed settings, where different users may have different models, and are only willing to share their prediction sets for a new test point in a black-box fashion. As we demonstrate, it is also well-suited to settings with distribution drift and shift, where model selection can be imprudent.
