Table of Contents
Fetching ...

Adaptive Conformal Guidance for Learning under Uncertainty

Rui Liu, Peng Gao, Yu Shen, Ming Lin, Pratap Tokekar

TL;DR

AdaConG introduces a simple, general framework that embeds split conformal prediction into the training loop to adaptively weight guidance signals according to their uncertainty. By converting CP-derived uncertainty into a weight via $u(x)$ and $w(x)$, the method robustly combines guidance with task-specific losses across supervised, semi-supervised, and imitation-guided RL settings. Empirical results across knowledge distillation, SSL, gridworld navigation, and autonomous driving show substantial gains under imperfect guidance, including up to $+10.89\%$ in KD and over $6\times$ higher rewards in gridworld. The approach is model- and domain-agnostic, offering a practical tool for robust learning under distribution shifts and uncertain supervision.

Abstract

Learning with guidance has proven effective across a wide range of machine learning systems. Guidance may, for example, come from annotated datasets in supervised learning, pseudo-labels in semi-supervised learning, and expert demonstration policies in reinforcement learning. However, guidance signals can be noisy due to domain shifts and limited data availability and may not generalize well. Blindly trusting such signals when they are noisy, incomplete, or misaligned with the target domain can lead to degraded performance. To address these challenges, we propose Adaptive Conformal Guidance (AdaConG), a simple yet effective approach that dynamically modulates the influence of guidance signals based on their associated uncertainty, quantified via split conformal prediction (CP). By adaptively adjusting to guidance uncertainty, AdaConG enables models to reduce reliance on potentially misleading signals and enhance learning performance. We validate AdaConG across diverse tasks, including knowledge distillation, semi-supervised image classification, gridworld navigation, and autonomous driving. Experimental results demonstrate that AdaConG improves performance and robustness under imperfect guidance, e.g., in gridworld navigation, it accelerates convergence and achieves over $6\times$ higher rewards than the best-performing baseline. These results highlight AdaConG as a broadly applicable solution for learning under uncertainty.

Adaptive Conformal Guidance for Learning under Uncertainty

TL;DR

AdaConG introduces a simple, general framework that embeds split conformal prediction into the training loop to adaptively weight guidance signals according to their uncertainty. By converting CP-derived uncertainty into a weight via and , the method robustly combines guidance with task-specific losses across supervised, semi-supervised, and imitation-guided RL settings. Empirical results across knowledge distillation, SSL, gridworld navigation, and autonomous driving show substantial gains under imperfect guidance, including up to in KD and over higher rewards in gridworld. The approach is model- and domain-agnostic, offering a practical tool for robust learning under distribution shifts and uncertain supervision.

Abstract

Learning with guidance has proven effective across a wide range of machine learning systems. Guidance may, for example, come from annotated datasets in supervised learning, pseudo-labels in semi-supervised learning, and expert demonstration policies in reinforcement learning. However, guidance signals can be noisy due to domain shifts and limited data availability and may not generalize well. Blindly trusting such signals when they are noisy, incomplete, or misaligned with the target domain can lead to degraded performance. To address these challenges, we propose Adaptive Conformal Guidance (AdaConG), a simple yet effective approach that dynamically modulates the influence of guidance signals based on their associated uncertainty, quantified via split conformal prediction (CP). By adaptively adjusting to guidance uncertainty, AdaConG enables models to reduce reliance on potentially misleading signals and enhance learning performance. We validate AdaConG across diverse tasks, including knowledge distillation, semi-supervised image classification, gridworld navigation, and autonomous driving. Experimental results demonstrate that AdaConG improves performance and robustness under imperfect guidance, e.g., in gridworld navigation, it accelerates convergence and achieves over higher rewards than the best-performing baseline. These results highlight AdaConG as a broadly applicable solution for learning under uncertainty.

Paper Structure

This paper contains 48 sections, 2 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Overview of the AdaConG approach. AdaConG leverages split CP with calibration to quantify the uncertainty of guidance signals and adaptively modulate their influence. The estimated uncertainty $u$ is converted into an adaptive weight $w$, which reweights the guidance loss. This weighted guidance loss is then combined with the task loss to update the model, enabling effective learning under uncertain guidance.
  • Figure 2: (a-c) Learning Curves. We compare AdaConG and Hard AdaConG with other baselines, including SAC, IBRL, and Soft IBRL, and present their learning curves across three environments: (a) Lava 1, (b) Lava 2, and (c) Door. AdaConG and Hard AdaConG perform similarly, converging faster and achieving higher rewards than other baselines in all environments. (d) Prediction Uncertainty. We show the average prediction uncertainties of AdaConG and Hard AdaConG, taking the Lava 1 environment as the example. Over time, their prediction uncertainties shrink and approach that of the IL policy, demonstrating the development of a well-learned RL policy.
  • Figure 3: Top-1 accuracy of KD using AdaConG with varying temperature $\gamma$ values for adaptive weighting.
  • Figure 4: Top-1 accuracy of KD using AdaConG with varying $\alpha$ values. The results demonstrate that our approach is robust to the choice of $\alpha$ and consistently outperforms standard KD.
  • Figure 5: Average size of the prediction set for the teacher model.
  • ...and 5 more figures