Table of Contents
Fetching ...

Enhancing LLM-based Autonomous Driving with Modular Traffic Light and Sign Recognition

Fabian Schmidt, Noushiq Mohammed Kayilan Abdul Nazar, Markus Enzweiler, Abhinav Valada

TL;DR

This paper addresses the gap that prior LLM-based autonomous driving systems lack explicit enforcement of traffic rules and robust detection of small safety-critical cues. It introduces TLS-Assist, a modular redundancy layer that detects traffic lights and signs and converts detections into concise natural-language messages fed into the LLM-based driving agent, enabling explicit safety-oriented reasoning without altering the planner. The approach uses single- and multi-view image processing, lightweight detectors (YOLO11 variants) for TLR and TSR, relevance prediction, state validation, and template-based message generation, integrated as a plug-and-play extension. Closed-loop evaluation on the LangAuto benchmark in CARLA shows consistent improvements in Driving Score, with reductions in red-light and stop-sign infractions (up to 64% and 81%, respectively), and notable route-completion gains for BEVDriver, demonstrating practical impact for safer, rule-compliant language-guided driving.

Abstract

Large Language Models (LLMs) are increasingly used for decision-making and planning in autonomous driving, showing promising reasoning capabilities and potential to generalize across diverse traffic situations. However, current LLM-based driving agents lack explicit mechanisms to enforce traffic rules and often struggle to reliably detect small, safety-critical objects such as traffic lights and signs. To address this limitation, we introduce TLS-Assist, a modular redundancy layer that augments LLM-based autonomous driving agents with explicit traffic light and sign recognition. TLS-Assist converts detections into structured natural language messages that are injected into the LLM input, enforcing explicit attention to safety-critical cues. The framework is plug-and-play, model-agnostic, and supports both single-view and multi-view camera setups. We evaluate TLS-Assist in a closed-loop setup on the LangAuto benchmark in CARLA. The results demonstrate relative driving performance improvements of up to 14% over LMDrive and 7% over BEVDriver, while consistently reducing traffic light and sign infractions. We publicly release the code and models on https://github.com/iis-esslingen/TLS-Assist.

Enhancing LLM-based Autonomous Driving with Modular Traffic Light and Sign Recognition

TL;DR

This paper addresses the gap that prior LLM-based autonomous driving systems lack explicit enforcement of traffic rules and robust detection of small safety-critical cues. It introduces TLS-Assist, a modular redundancy layer that detects traffic lights and signs and converts detections into concise natural-language messages fed into the LLM-based driving agent, enabling explicit safety-oriented reasoning without altering the planner. The approach uses single- and multi-view image processing, lightweight detectors (YOLO11 variants) for TLR and TSR, relevance prediction, state validation, and template-based message generation, integrated as a plug-and-play extension. Closed-loop evaluation on the LangAuto benchmark in CARLA shows consistent improvements in Driving Score, with reductions in red-light and stop-sign infractions (up to 64% and 81%, respectively), and notable route-completion gains for BEVDriver, demonstrating practical impact for safer, rule-compliant language-guided driving.

Abstract

Large Language Models (LLMs) are increasingly used for decision-making and planning in autonomous driving, showing promising reasoning capabilities and potential to generalize across diverse traffic situations. However, current LLM-based driving agents lack explicit mechanisms to enforce traffic rules and often struggle to reliably detect small, safety-critical objects such as traffic lights and signs. To address this limitation, we introduce TLS-Assist, a modular redundancy layer that augments LLM-based autonomous driving agents with explicit traffic light and sign recognition. TLS-Assist converts detections into structured natural language messages that are injected into the LLM input, enforcing explicit attention to safety-critical cues. The framework is plug-and-play, model-agnostic, and supports both single-view and multi-view camera setups. We evaluate TLS-Assist in a closed-loop setup on the LangAuto benchmark in CARLA. The results demonstrate relative driving performance improvements of up to 14% over LMDrive and 7% over BEVDriver, while consistently reducing traffic light and sign infractions. We publicly release the code and models on https://github.com/iis-esslingen/TLS-Assist.

Paper Structure

This paper contains 22 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of the proposed TLS-Assist framework. TLS-Assist enhances vision-based LLM autonomous driving (AD) agents, with optional LiDAR input, by detecting traffic lights and signs and expressing them as natural language instructions. The extended agents are subsequently evaluated in closed-loop driving using the CARLA simulator.
  • Figure 2: Overview of the proposed TLS-Assist framework. Multi-view images are preprocessed and passed to separate detection modules for traffic lights and traffic signs. Traffic light detections are refined by relevance prediction and state validation, while traffic sign detections are prioritized according to driving relevance. Both results are consolidated into natural language messages and forwarded to the LLM-based autonomous driving (AD) agent.
  • Figure 3: Predefined notice instructions used for message generation. Traffic light and sign detections are mapped to natural language templates that are concatenated and provided as textual input to the LLM-based driving agent.