CoT-VLM4Tar: Chain-of-Thought Guided Vision-Language Models for Traffic Anomaly Resolution
Tianchi Ren, Haibo Hu, Jiacheng Zuo, Xinhong Chen, Jianping Wang, Chun Jason Xue, Jen-Ming Wu, Nan Guan
TL;DR
CoT-VLM4Tar tackles the challenge of real-time traffic anomaly resolution by guiding a vision-language model with a four-stage chain-of-thought within a CARLA-based closed-loop framework. The approach replays ghost jams, intersection deadlocks, and accidents in CARLA, uses Scene, Analysis, Solution, and Formatting stages to reason about scenarios, and converts the VLM outputs into executable CARLA commands via an integration module. Experiments compare multiple VLMs (GPT-4o, MiniCPM14b, VILA40b) and show that ChatGPT-4o provides the most coherent, actionable solutions, achieving around 14 seconds per scenario, faster than typical on-site intervention. This work demonstrates feasibility for autonomous traffic management and lays the groundwork for integrating VLM-based decision-making into real-time urban traffic systems.
Abstract
With the acceleration of urbanization, modern urban traffic systems are becoming increasingly complex, leading to frequent traffic anomalies. These anomalies encompass not only common traffic jams but also more challenging issues such as phantom traffic jams, intersection deadlocks, and accident liability analysis, which severely impact traffic flow, vehicular safety, and overall transportation efficiency. Currently, existing solutions primarily rely on manual intervention by traffic police or artificial intelligence-based detection systems. However, these methods often suffer from response delays and inconsistent management due to inadequate resources, while AI detection systems, despite enhancing efficiency to some extent, still struggle to handle complex traffic anomalies in a real-time and precise manner. To address these issues, we propose CoT-VLM4Tar: (Chain of Thought Visual-Language Model for Traffic Anomaly Resolution), this innovative approach introduces a new chain-of-thought to guide the VLM in analyzing, reasoning, and generating solutions for traffic anomalies with greater reasonable and effective solution, and to evaluate the performance and effectiveness of our method, we developed a closed-loop testing framework based on the CARLA simulator. Furthermore, to ensure seamless integration of the solutions generated by the VLM with the CARLA simulator, we implement an itegration module that converts these solutions into executable commands. Our results demonstrate the effectiveness of VLM in the resolution of real-time traffic anomalies, providing a proof-of-concept for its integration into autonomous traffic management systems.
