Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

Feichen Gan; Youcun Lu; Yingying Zhang; Yukun Liu

Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

Feichen Gan, Youcun Lu, Yingying Zhang, Yukun Liu

TL;DR

This work proposes a modular pseudo-return construction based on truncated rollouts and a time-aware calibration strategy using experience replay and weighted subsampling to mitigate model bias and restore approximate exchangeability, enabling uncertainty quantification even under policy shifts.

Abstract

Reliable uncertainty quantification is crucial for reinforcement learning (RL) in high-stakes settings. We propose a unified conformal prediction framework for infinite-horizon policy evaluation that constructs distribution-free prediction intervals {for returns} in both on-policy and off-policy settings. Our method integrates distributional RL with conformal calibration, addressing challenges such as unobserved returns, temporal dependencies, and distributional shifts. We propose a modular pseudo-return construction based on truncated rollouts and a time-aware calibration strategy using experience replay and weighted subsampling. These innovations mitigate model bias and restore approximate exchangeability, enabling uncertainty quantification even under policy shifts. Our theoretical analysis provides coverage guarantees that account for model misspecification and importance weight estimation. Empirical results, including experiments in synthetic and benchmark environments like Mountain Car, show that our method significantly improves coverage and reliability over standard distributional RL baselines.

Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

TL;DR

Abstract

Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)

Theorems & Definitions (3)