QuickLAP: Quick Language-Action Preference Learning for Autonomous Driving Agents
Jordan Abi Nader, David Lee, Nathaniel Dennler, Andreea Bobu
TL;DR
QuickLAP tackles online reward learning for autonomous driving by fusing physical corrections and natural language in a principled Bayesian framework. It treats language as a probabilistic observation over latent rewards and uses dual-LLMs to produce a feature-attention mask $r$ and a reward-shift $\mu$ with confidence $m$, which are integrated with a conditional prior and a Boltzmann-like physical likelihood to yield a closed-form MAP update. The key result is a Kalman-like update $\hat{\theta}_i^{t+1}=\hat{\theta}_i^t+\frac{\sigma_{L,i}^2\Delta\Phi_i+\mu_i^t}{\Lambda_{prior,i}\sigma_{L,i}^2+1}$ that adapts to the reliability of language and the relevance of features, enabling fast, robust, real-time learning. Empirically, QuickLAP achieves large reductions in reward-inference error in simulated driving scenarios and gains higher perceived understandability and collaboration in a 15-participant user study, with code available at the project repository. This work advances multimodal human-robot interaction by providing a general framework for online, interpretable preference learning that leverages language to disambiguate grounded physical feedback.
Abstract
Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpreted. QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule. This enables fast, real-time, and robust reward learning that handles ambiguous feedback. In a semi-autonomous driving simulator, QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines. A 15-participant user study further validates our approach: participants found QuickLAP significantly more understandable and collaborative, and preferred its learned behavior over baselines. Code is available at https://github.com/MIT-CLEAR-Lab/QuickLAP.
