Table of Contents
Fetching ...

From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous Driving

Xinyu Xia, Xingjun Ma, Yunfeng Hu, Ting Qu, Hong Chen, Xun Gong

TL;DR

This work tackles robustness and generalization gaps in autonomous driving by focusing on failure-driven, semantically informed scenario repair. It introduces SERA, a closed-loop framework that analyzes pre-evaluation logs to identify failure patterns, retrieves semantically aligned scenarios from a structured scenario bank, applies an LLM-based reflection to refine selections, and performs few-shot, targeted fine-tuning to repair safety-critical weaknesses. Key contributions include the formalization of a Scenario Descriptor and Scenario Bank, a Failure-Aware Scenario Recommendation pipeline, an LLM-based Reflection mechanism, and a self-evolving repair loop validated on Bench2Drive/CARLA showing consistent performance gains across diverse baselines. The approach offers a practical, data-efficient pathway to improve safety-critical behavior and generalization in autonomous driving, with demonstrated improvements in Driving Score, Success Rate, Efficiency, and Comfortness across multiple architectures.

Abstract

Ensuring robust and generalizable autonomous driving requires not only broad scenario coverage but also efficient repair of failure cases, particularly those related to challenging and safety-critical scenarios. However, existing scenario generation and selection methods often lack adaptivity and semantic relevance, limiting their impact on performance improvement. In this paper, we propose \textbf{SERA}, an LLM-powered framework that enables autonomous driving systems to self-evolve by repairing failure cases through targeted scenario recommendation. By analyzing performance logs, SERA identifies failure patterns and dynamically retrieves semantically aligned scenarios from a structured bank. An LLM-based reflection mechanism further refines these recommendations to maximize relevance and diversity. The selected scenarios are used for few-shot fine-tuning, enabling targeted adaptation with minimal data. Experiments on the benchmark show that SERA consistently improves key metrics across multiple autonomous driving baselines, demonstrating its effectiveness and generalizability under safety-critical conditions.

From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous Driving

TL;DR

This work tackles robustness and generalization gaps in autonomous driving by focusing on failure-driven, semantically informed scenario repair. It introduces SERA, a closed-loop framework that analyzes pre-evaluation logs to identify failure patterns, retrieves semantically aligned scenarios from a structured scenario bank, applies an LLM-based reflection to refine selections, and performs few-shot, targeted fine-tuning to repair safety-critical weaknesses. Key contributions include the formalization of a Scenario Descriptor and Scenario Bank, a Failure-Aware Scenario Recommendation pipeline, an LLM-based Reflection mechanism, and a self-evolving repair loop validated on Bench2Drive/CARLA showing consistent performance gains across diverse baselines. The approach offers a practical, data-efficient pathway to improve safety-critical behavior and generalization in autonomous driving, with demonstrated improvements in Driving Score, Success Rate, Efficiency, and Comfortness across multiple architectures.

Abstract

Ensuring robust and generalizable autonomous driving requires not only broad scenario coverage but also efficient repair of failure cases, particularly those related to challenging and safety-critical scenarios. However, existing scenario generation and selection methods often lack adaptivity and semantic relevance, limiting their impact on performance improvement. In this paper, we propose \textbf{SERA}, an LLM-powered framework that enables autonomous driving systems to self-evolve by repairing failure cases through targeted scenario recommendation. By analyzing performance logs, SERA identifies failure patterns and dynamically retrieves semantically aligned scenarios from a structured bank. An LLM-based reflection mechanism further refines these recommendations to maximize relevance and diversity. The selected scenarios are used for few-shot fine-tuning, enabling targeted adaptation with minimal data. Experiments on the benchmark show that SERA consistently improves key metrics across multiple autonomous driving baselines, demonstrating its effectiveness and generalizability under safety-critical conditions.

Paper Structure

This paper contains 25 sections, 12 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Conceptual illustration of SERA. The system performs pre-evaluation to detect failure cases, leverages failure-aware scenario recommendation to retrieve vulnerable instances, and applies self-evolving scenario repair for targeted model adaptation. An example on the right shows a failure due to low-visibility collision that is successfully repaired through efficient fine-tuning, leading to improved decision-making under safety-critical conditions.
  • Figure 2: Overview of the proposed SERA framework.
  • Figure 3: Ability-wise success rate comparison of UniAD under different selection strategies (Random, Initial Rec., and Full SERA). Full SERA consistently improves performance across various driving abilities.
  • Figure 4: Qualitative comparison between VAD (red dashed borders) and SERA (green dashed borders) across various autonomous driving scenarios. Each column represents a future timestamp (t, t+1s, t+2s, t+3s), showing the behavioral differences between the two methods. SERA demonstrates more consistent and safer navigation compared to VAD.