Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving
Mehdi Azarafza, Mojtaba Nayyeri, Charles Steinmetz, Steffen Staab, Achim Rettberg
TL;DR
The paper addresses the challenge of generalizing LLM-based reasoning to real-time autonomous driving by introducing a hybrid reasoning pipeline that fuses perception data from YOLOv8 with sensor inputs into a GPT-4–style LLM to produce brake and throttle commands in CARLA. It evaluates nine scenarios under three weather conditions using three reasoning modes—common-sense, arithmetic, and hybrid—with hybrid reasoning yielding the strongest performance, averaging over $65\%$ accuracy and providing precise run-time control trajectories. The work demonstrates that LLMs can be structured to reason about dynamic driving contexts and generate actionable, scenario-specific control signals, potentially augmenting autopilot systems where traditional methods struggle under low-visibility or complex environments. It also highlights practical considerations such as latency and the need for domain-specific lightweight LLMs to enable real-time deployment, outlining future directions to optimize inputs and improve run-time efficiency. The study contributes a concrete framework for incorporating mathematical and commonsense reasoning in autonomous driving, with quantified benefits in a high-fidelity simulator and implications for real-world decision-making under adverse conditions.
Abstract
Large Language Models (LLMs) have garnered significant attention for their ability to understand text and images, generate human-like text, and perform complex reasoning tasks. However, their ability to generalize this advanced reasoning with a combination of natural language text for decision-making in dynamic situations requires further exploration. In this study, we investigate how well LLMs can adapt and apply a combination of arithmetic and common-sense reasoning, particularly in autonomous driving scenarios. We hypothesize that LLMs hybrid reasoning abilities can improve autonomous driving by enabling them to analyze detected object and sensor data, understand driving regulations and physical laws, and offer additional context. This addresses complex scenarios, like decisions in low visibility (due to weather conditions), where traditional methods might fall short. We evaluated Large Language Models (LLMs) based on accuracy by comparing their answers with human-generated ground truth inside CARLA. The results showed that when a combination of images (detected objects) and sensor data is fed into the LLM, it can offer precise information for brake and throttle control in autonomous vehicles across various weather conditions. This formulation and answers can assist in decision-making for auto-pilot systems.
