Predictive inference for time series: why is split conformal effective despite temporal dependence?
Rina Foygel Barber, Ashwin Pananjady
TL;DR
The paper develops a theoretical framework to explain why split conformal prediction remains effective for time series with temporal dependence and memory-based predictors. It introduces the switch coefficient to quantify deviations from exchangeability caused by dependence and shows that coverage loss for pretrained and split conformal methods can be bounded by this coefficient, with sharp results for stationary $\\beta$-mixing processes. The authors derive matching lower bounds, establish conditions preventing undercovering or overcovering, and extend the results to memory-enabled score functions, demonstrating faster (linear) rates compared to prior $\\sqrt{\\tau/n}$-type bounds. Overall, the work provides a principled, quantitative explanation for the empirical success of split conformal in dependent data settings and offers tools potentially applicable to broader non-exchangeable predictive inference methods.
Abstract
We consider the problem of uncertainty quantification for prediction in a time series: if we use past data to forecast the next time point, can we provide valid prediction intervals around our forecasts? To avoid placing distributional assumptions on the data, in recent years the conformal prediction method has been a popular approach for predictive inference, since it provides distribution-free coverage for any iid or exchangeable data distribution. However, in the time series setting, the strong empirical performance of conformal prediction methods is not well understood, since even short-range temporal dependence is a strong violation of the exchangeability assumption. Using predictors with "memory" -- i.e., predictors that utilize past observations, such as autoregressive models -- further exacerbates this problem. In this work, we examine the theoretical properties of split conformal prediction in the time series setting, including the case where predictors may have memory. Our results bound the loss of coverage of these methods in terms of a new "switch coefficient", measuring the extent to which temporal dependence within the time series creates violations of exchangeability. Our characterization of the coverage probability is sharp over the class of stationary, $β$-mixing processes. Along the way, we introduce tools that may prove useful in analyzing other predictive inference methods for dependent data.
