From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous Driving
Xinyu Xia, Xingjun Ma, Yunfeng Hu, Ting Qu, Hong Chen, Xun Gong
TL;DR
This work tackles robustness and generalization gaps in autonomous driving by focusing on failure-driven, semantically informed scenario repair. It introduces SERA, a closed-loop framework that analyzes pre-evaluation logs to identify failure patterns, retrieves semantically aligned scenarios from a structured scenario bank, applies an LLM-based reflection to refine selections, and performs few-shot, targeted fine-tuning to repair safety-critical weaknesses. Key contributions include the formalization of a Scenario Descriptor and Scenario Bank, a Failure-Aware Scenario Recommendation pipeline, an LLM-based Reflection mechanism, and a self-evolving repair loop validated on Bench2Drive/CARLA showing consistent performance gains across diverse baselines. The approach offers a practical, data-efficient pathway to improve safety-critical behavior and generalization in autonomous driving, with demonstrated improvements in Driving Score, Success Rate, Efficiency, and Comfortness across multiple architectures.
Abstract
Ensuring robust and generalizable autonomous driving requires not only broad scenario coverage but also efficient repair of failure cases, particularly those related to challenging and safety-critical scenarios. However, existing scenario generation and selection methods often lack adaptivity and semantic relevance, limiting their impact on performance improvement. In this paper, we propose \textbf{SERA}, an LLM-powered framework that enables autonomous driving systems to self-evolve by repairing failure cases through targeted scenario recommendation. By analyzing performance logs, SERA identifies failure patterns and dynamically retrieves semantically aligned scenarios from a structured bank. An LLM-based reflection mechanism further refines these recommendations to maximize relevance and diversity. The selected scenarios are used for few-shot fine-tuning, enabling targeted adaptation with minimal data. Experiments on the benchmark show that SERA consistently improves key metrics across multiple autonomous driving baselines, demonstrating its effectiveness and generalizability under safety-critical conditions.
