Enhancing LLM-based Autonomous Driving with Modular Traffic Light and Sign Recognition

Fabian Schmidt; Noushiq Mohammed Kayilan Abdul Nazar; Markus Enzweiler; Abhinav Valada

Enhancing LLM-based Autonomous Driving with Modular Traffic Light and Sign Recognition

Fabian Schmidt, Noushiq Mohammed Kayilan Abdul Nazar, Markus Enzweiler, Abhinav Valada

TL;DR

This paper addresses the gap that prior LLM-based autonomous driving systems lack explicit enforcement of traffic rules and robust detection of small safety-critical cues. It introduces TLS-Assist, a modular redundancy layer that detects traffic lights and signs and converts detections into concise natural-language messages fed into the LLM-based driving agent, enabling explicit safety-oriented reasoning without altering the planner. The approach uses single- and multi-view image processing, lightweight detectors (YOLO11 variants) for TLR and TSR, relevance prediction, state validation, and template-based message generation, integrated as a plug-and-play extension. Closed-loop evaluation on the LangAuto benchmark in CARLA shows consistent improvements in Driving Score, with reductions in red-light and stop-sign infractions (up to 64% and 81%, respectively), and notable route-completion gains for BEVDriver, demonstrating practical impact for safer, rule-compliant language-guided driving.

Abstract

Large Language Models (LLMs) are increasingly used for decision-making and planning in autonomous driving, showing promising reasoning capabilities and potential to generalize across diverse traffic situations. However, current LLM-based driving agents lack explicit mechanisms to enforce traffic rules and often struggle to reliably detect small, safety-critical objects such as traffic lights and signs. To address this limitation, we introduce TLS-Assist, a modular redundancy layer that augments LLM-based autonomous driving agents with explicit traffic light and sign recognition. TLS-Assist converts detections into structured natural language messages that are injected into the LLM input, enforcing explicit attention to safety-critical cues. The framework is plug-and-play, model-agnostic, and supports both single-view and multi-view camera setups. We evaluate TLS-Assist in a closed-loop setup on the LangAuto benchmark in CARLA. The results demonstrate relative driving performance improvements of up to 14% over LMDrive and 7% over BEVDriver, while consistently reducing traffic light and sign infractions. We publicly release the code and models on https://github.com/iis-esslingen/TLS-Assist.

Enhancing LLM-based Autonomous Driving with Modular Traffic Light and Sign Recognition

TL;DR

Abstract

Enhancing LLM-based Autonomous Driving with Modular Traffic Light and Sign Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)