Table of Contents
Fetching ...

Attention-Based Feature Online Conformal Prediction for Time Series

Meiyi Zhu, Caili Guo, Chunyan Feng, Osvaldo Simeone

TL;DR

The paper tackles uncertainty quantification for non-stationary time series by extending online conformal prediction to operate in the learned feature space and by introducing an attention-based weighting scheme over historical observations. The proposed AFOCP framework combines feature-space nonconformity scores with data-driven, online adaptive weights to produce prediction sets that maintain long-term coverage while reducing interval lengths. The authors prove deterministic long-term coverage guarantees and show that FOCP/AFOCP achieve shorter time-averaged intervals than standard OCP under mild regularity assumptions, with formal results complemented by extensive experiments on synthetic and real-world datasets. This approach offers a practical and scalable path to reliable and efficient uncertainty quantification in non-stationary environments, with potential applicability to diverse time-series applications.

Abstract

Online conformal prediction (OCP) wraps around any pre-trained predictor to produce prediction sets with coverage guarantees that hold irrespective of temporal dependencies or distribution shifts. However, standard OCP faces two key limitations: it operates in the output space using simple nonconformity (NC) scores, and it treats all historical observations uniformly when estimating quantiles. This paper introduces attention-based feature OCP (AFOCP), which addresses both limitations through two key innovations. First, AFOCP operates in the feature space of pre-trained neural networks, leveraging learned representations to construct more compact prediction sets by concentrating on task-relevant information while suppressing nuisance variation. Second, AFOCP incorporates an attention mechanism that adaptively weights historical observations based on their relevance to the current test point, effectively handling non-stationarity and distribution shifts. We provide theoretical guarantees showing that AFOCP maintains long-term coverage while provably achieving smaller prediction intervals than standard OCP under mild regularity conditions. Extensive experiments on synthetic and real-world time series datasets demonstrate that AFOCP consistently reduces the size of prediction intervals by as much as $88\%$ as compared to OCP, while maintaining target coverage levels, validating the benefits of both feature-space calibration and attention-based adaptive weighting.

Attention-Based Feature Online Conformal Prediction for Time Series

TL;DR

The paper tackles uncertainty quantification for non-stationary time series by extending online conformal prediction to operate in the learned feature space and by introducing an attention-based weighting scheme over historical observations. The proposed AFOCP framework combines feature-space nonconformity scores with data-driven, online adaptive weights to produce prediction sets that maintain long-term coverage while reducing interval lengths. The authors prove deterministic long-term coverage guarantees and show that FOCP/AFOCP achieve shorter time-averaged intervals than standard OCP under mild regularity assumptions, with formal results complemented by extensive experiments on synthetic and real-world datasets. This approach offers a practical and scalable path to reliable and efficient uncertainty quantification in non-stationary environments, with potential applicability to diverse time-series applications.

Abstract

Online conformal prediction (OCP) wraps around any pre-trained predictor to produce prediction sets with coverage guarantees that hold irrespective of temporal dependencies or distribution shifts. However, standard OCP faces two key limitations: it operates in the output space using simple nonconformity (NC) scores, and it treats all historical observations uniformly when estimating quantiles. This paper introduces attention-based feature OCP (AFOCP), which addresses both limitations through two key innovations. First, AFOCP operates in the feature space of pre-trained neural networks, leveraging learned representations to construct more compact prediction sets by concentrating on task-relevant information while suppressing nuisance variation. Second, AFOCP incorporates an attention mechanism that adaptively weights historical observations based on their relevance to the current test point, effectively handling non-stationarity and distribution shifts. We provide theoretical guarantees showing that AFOCP maintains long-term coverage while provably achieving smaller prediction intervals than standard OCP under mild regularity conditions. Extensive experiments on synthetic and real-world time series datasets demonstrate that AFOCP consistently reduces the size of prediction intervals by as much as as compared to OCP, while maintaining target coverage levels, validating the benefits of both feature-space calibration and attention-based adaptive weighting.

Paper Structure

This paper contains 23 sections, 4 theorems, 32 equations, 6 figures, 1 algorithm.

Key Result

Theorem 1

The average error over time $T\in\mathbb{N}$ is upper bounded as In particular, as $T\rightarrow\infty$, the error converges to the desired level $\alpha$, i.e.,

Figures (6)

  • Figure 1: Overview of AFOCP and related baselines. (a) The goal of this work is to calibrate pre-trained predictors by augmenting their outputs with prediction sets that contain the true label $Y$ for a fraction at least $100(1-\alpha)\%$ of the time. (b) For any input $X$, the pre-trained model $\mu (X) = g \circ f(X)$ maps inputs $X$ through the feature extractor $f(\cdot)$ and the prediction head $g(\cdot)$. (c) Nonconformity (NC) scores can be evaluated in the output or feature spaces. (d) The NC scores can be combined to evaluate empirical distributions and quantiles using either uniform weights or attention-based weights. (e) OCP gibbs2021adaptive uses output scores with uniform weights; feature OCP (FOCP) uses feature scores, while retaining uniform weights; attention-based OCP (AOCP) keeps output scores but learns data-dependent weights via attention; and attention-based feature OCP (AFOCP) combines feature scores with attention-based weights. FOCP, AOCP, and AFOCP are introduced in this work, with AFOCP being the most general of the three. (f) The attention mechanism in AOCP and AFOCP compares the current model feature with past features to produce similarity-based weights that serve as data-dependent weights for calibration.
  • Figure 2: Time-averaged coverage (left) and time-averaged interval length (right) of OCP, FOCP, AOCP, and AFOCP versus time $T$ across various datasets, with window length $L = 100$, feature dimension $D = 50$, and target miscoverage rate $\alpha=0.1$.
  • Figure 3: Time-averaged coverage (left) and time-averaged interval length (right) of OCP, FOCP, AOCP, and AFOCP versus window length $L$ for synthetic data and air quality datasets with feature dimension $D= 50$ and target miscoverage rate $\alpha=0.1$.
  • Figure 4: Time-averaged coverage (left) and time-averaged interval length (right) of OCP, FOCP, AOCP, and AFOCP versus feature dimension $D$ across various datasets with window length $L=100$ and target miscoverage rate $\alpha=0.1$.
  • Figure :
  • ...and 1 more figures

Theorems & Definitions (4)

  • Theorem 1: Proposition 4.1 gibbs2021adaptive
  • Corollary 1
  • Theorem 2
  • Theorem 3