Towards Explainable, Safe Autonomous Driving with Language Embeddings for Novelty Identification and Active Learning: Framework and Experimental Analysis with Real-World Data Sets
Ross Greer, Mohan Trivedi
TL;DR
The paper tackles novelty in autonomous driving, where open-set, high-level scene reasoning is needed beyond traditional safety metrics. It proposes using language embeddings via CLIP to identify novel driving scenes and to support safety takeovers and active learning across real-world datasets. Novelty is detected by clustering CLIP-based image embeddings and labeling unclustered images as novel, with textual explanations of novelty generated by a language-vision–LLM pipeline, demonstrated on LAVA and TUMTraf. The results show effective isolation of novel scenes and plausible explanations, suggesting practical impact for safe takeovers, data curation, and multi-task active learning in real-world autonomous driving deployments.
Abstract
This research explores the integration of language embeddings for active learning in autonomous driving datasets, with a focus on novelty detection. Novelty arises from unexpected scenarios that autonomous vehicles struggle to navigate, necessitating higher-level reasoning abilities. Our proposed method employs language-based representations to identify novel scenes, emphasizing the dual purpose of safety takeover responses and active learning. The research presents a clustering experiment using Contrastive Language-Image Pretrained (CLIP) embeddings to organize datasets and detect novelties. We find that the proposed algorithm effectively isolates novel scenes from a collection of subsets derived from two real-world driving datasets, one vehicle-mounted and one infrastructure-mounted. From the generated clusters, we further present methods for generating textual explanations of elements which differentiate scenes classified as novel from other scenes in the data pool, presenting qualitative examples from the clustered results. Our results demonstrate the effectiveness of language-driven embeddings in identifying novel elements and generating explanations of data, and we further discuss potential applications in safe takeovers, data curation, and multi-task active learning.
