Table of Contents
Fetching ...

Prompt-Driven Domain Adaptation for End-to-End Autonomous Driving via In-Context RL

Aleesha Khurram, Amir Moeini, Shangtong Zhang, Rohan Chandra

TL;DR

The paper tackles distribution shifts in end-to-end autonomous driving under adverse weather and dense traffic. It introduces In-Context Reinforcement Learning (ICRL), a prompt-driven, inference-time adaptation framework embedded in a LimSim++-based simulation stack, enabling a driving policy trained in clear weather to adapt without parameter updates. Experimental results in CARLATown05 and Town06 show ICRL outperforms perception- and planning-based prompt baselines in safety, efficiency, and comfort, particularly as weather becomes more inclement or traffic denser, with a focus on safe lane changes and robust junction handling. The work highlights the potential of ICRL as a general, data-efficient layer for few-shot domain adaptation in safety-critical robotics, while outlining theoretical questions and future work toward broader applicability and formal guarantees.

Abstract

Despite significant progress and advances in autonomous driving, many end-to-end systems still struggle with domain adaptation (DA), such as transferring a policy trained under clear weather to adverse weather conditions. Typical DA strategies in the literature include collecting additional data in the target domain or re-training the model, or both. Both these strategies quickly become impractical as we increase scale and complexity of driving. These limitations have encouraged investigation into few-shot and zero-shot prompt-driven DA at inference time involving LLMs and VLMs. These methods work by adding a few state-action trajectories during inference to the prompt (similar to in-context learning). However, there are two limitations of such an approach: $(i)$ prompt-driven DA methods are currently restricted to perception tasks such as detection and segmentation and $(ii)$ they require expert few-shot data. In this work, we present a new approach to inference-time few-shot prompt-driven DA for closed-loop autonomous driving in adverse weather condition using in-context reinforcement learning (ICRL). Similar to other prompt-driven DA methods, our approach does not require any updates to the model parameters nor does it require additional data collection in adversarial weather regime. Furthermore, our approach advances the state-of-the-art in prompt-driven DA by extending to closed driving using general trajectories observed during inference. Our experiments using the CARLA simulator show that ICRL results in safer, more efficient, and more comfortable driving policies in the target domain compared to state-of-the-art prompt-driven DA baselines.

Prompt-Driven Domain Adaptation for End-to-End Autonomous Driving via In-Context RL

TL;DR

The paper tackles distribution shifts in end-to-end autonomous driving under adverse weather and dense traffic. It introduces In-Context Reinforcement Learning (ICRL), a prompt-driven, inference-time adaptation framework embedded in a LimSim++-based simulation stack, enabling a driving policy trained in clear weather to adapt without parameter updates. Experimental results in CARLATown05 and Town06 show ICRL outperforms perception- and planning-based prompt baselines in safety, efficiency, and comfort, particularly as weather becomes more inclement or traffic denser, with a focus on safe lane changes and robust junction handling. The work highlights the potential of ICRL as a general, data-efficient layer for few-shot domain adaptation in safety-critical robotics, while outlining theoretical questions and future work toward broader applicability and formal guarantees.

Abstract

Despite significant progress and advances in autonomous driving, many end-to-end systems still struggle with domain adaptation (DA), such as transferring a policy trained under clear weather to adverse weather conditions. Typical DA strategies in the literature include collecting additional data in the target domain or re-training the model, or both. Both these strategies quickly become impractical as we increase scale and complexity of driving. These limitations have encouraged investigation into few-shot and zero-shot prompt-driven DA at inference time involving LLMs and VLMs. These methods work by adding a few state-action trajectories during inference to the prompt (similar to in-context learning). However, there are two limitations of such an approach: prompt-driven DA methods are currently restricted to perception tasks such as detection and segmentation and they require expert few-shot data. In this work, we present a new approach to inference-time few-shot prompt-driven DA for closed-loop autonomous driving in adverse weather condition using in-context reinforcement learning (ICRL). Similar to other prompt-driven DA methods, our approach does not require any updates to the model parameters nor does it require additional data collection in adversarial weather regime. Furthermore, our approach advances the state-of-the-art in prompt-driven DA by extending to closed driving using general trajectories observed during inference. Our experiments using the CARLA simulator show that ICRL results in safer, more efficient, and more comfortable driving policies in the target domain compared to state-of-the-art prompt-driven DA baselines.

Paper Structure

This paper contains 19 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: An example of using prompt-driven domain adaptation (DA) to enable an ego-agent (orange) to change lanes in inclement weather in dense traffic (blue vehicles). (top row) Current prompt-driven DA baselines such as Chain-of-Thought (CoT) wei2022chain fail to safely change lanes. (bottom row) Using ICRL, a driving policy trained in clear weather successfully adapts and safely changes lanes in inclement weather.
  • Figure 2: Our system comprises of a driving simulator depicted in orange on the left and ICRL shown in blue on the right. The simulator consists of CARLA and SUMO to generate scenes and trajectories. The textual representations of the scene is added to the prompt which then undergoes ICRL to produce a decision that helps adapt the trajectory.
  • Figure 3: Comparing a lane-changing task using different prompt-driven DA methods. The orange vehicle is the ego-agent and the blue vehicles are the surrounding agents.
  • Figure 4: We plot the averaged reward metric curves across all episodes. $1$ is the highest, so reward curves that are closer to the top are better.