Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies

Bo Wu; Bruce D. Lee; Kostas Daniilidis; Bernadette Bucher; Nikolai Matni

Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies

Bo Wu, Bruce D. Lee, Kostas Daniilidis, Bernadette Bucher, Nikolai Matni

TL;DR

The paper addresses unreliable generalization of pre-trained language-conditioned imitation policies by introducing a calibration-enabled uncertainty-aware deployment framework. It calibrates model outputs with temperature scaling to align confidence with imitation correctness and employs uncertainty-aware action selection that aggregates neighboring action confidences within a distance threshold. The approach is evaluated on three models (PerAct, RVT, CLIPort) across RLBench and Ravens, showing improved task success, particularly when models are poorly calibrated or faced with distribution shifts like distractors. Importantly, the method does not require retraining the original policies, offering a scalable path to more reliable generalist robotic policies in diverse environments. The work highlights the situational dependence of calibration benefits and points to future work in extending calibrated uncertainty to dynamics forecasting and planning.

Abstract

Large-scale robotic policies trained on data from diverse tasks and robotic platforms hold great promise for enabling general-purpose robots; however, reliable generalization to new environment conditions remains a major challenge. Toward addressing this challenge, we propose a novel approach for uncertainty-aware deployment of pre-trained language-conditioned imitation learning agents. Specifically, we use temperature scaling to calibrate these models and exploit the calibrated model to make uncertainty-aware decisions by aggregating the local information of candidate actions. We implement our approach in simulation using three such pre-trained models, and showcase its potential to significantly enhance task completion rates. The accompanying code is accessible at the link: https://github.com/BobWu1998/uncertainty_quant_all.git

Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies

TL;DR

Abstract

Paper Structure (18 sections, 1 equation, 7 figures, 2 tables, 2 algorithms)

This paper contains 18 sections, 1 equation, 7 figures, 2 tables, 2 algorithms.

Introduction
Contributions
Related Work
Generalist Robotic Policies
Uncertainty Quantification
Approach
Background: Language-conditioned IL for Robotic Manipulation
Target Task Calibration
Uncertainty-Aware Action Selection
Experimental Results
Why Does Uncertainty-Aware Action Selection Help?
Do we need calibration?
Can Uncertainty-Awareness Improve Generalization?
Conclusion
Model details
...and 3 more sections

Figures (7)

Figure 1: Overview of uncertainty-aware action selection on top of pre-trained models. The blue cross indicates the maximum score in the heatmap. The upper path illustrate the standard deployment of pre-trained IL policies at test time in which the output of the pre-trained model directly predicts the action. The bottom path indicates our proposed approach, in which the model is calibrated using a small dataset of expert demonstrations from the task of interest before selecting the action in an uncertainty-aware manner.
Figure 2: Qualitative results for CLIPort with and without uncertainty-aware action selection. The language commands are in the subfigure captions. From top to bottom, each row represents the model's observation, the heatmap in pick phase, the heatmap in place phase, and the observation after executing the actions. In \ref{['fig:stack_blocks']} and \ref{['fig:cliport_kit_assembly1']}, our method forces the model to choose a central location on the target object instead of the edges. In \ref{['fig:cliport_kit_assembly2']}, our method corrects the model by smoothing out the action with highest raw score.
Figure 3: Success rate score for Perceiver-Actor, uncertainty-aware strategy with calibrated/uncalibrated confidence for each task
Figure 4: Success rate score for RVT, uncertainty-aware strategy with calibrated/uncalibrated confidence for each task
Figure 5: Maximum entropy over all episodes in each task of PerAct
...and 2 more figures

Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies

TL;DR

Abstract

Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies

Authors

TL;DR

Abstract

Table of Contents

Figures (7)