Table of Contents
Fetching ...

KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

Kemou Jiang, Xuan Cai, Zhiyong Cui, Aoyong Li, Yilong Ren, Haiyang Yu, Hao Yang, Daocheng Fu, Licheng Wen, Pinlong Cai

TL;DR

KoMA presents a knowledge-driven, multi-agent framework for autonomous driving that leverages LLMs to share knowledge, reason about surrounding vehicles, and plan across multiple steps. It couples five modules—Environment, Multi-agent Interaction, Multi-step Planning, Shared Memory, and Ranking-based Reflection—to enable cognitive synergy and memory-based generalization in complex traffic. Empirical results on highway on-ramp merging show KoMA achieving competitive performance and strong generalization across scenarios and LLMs, with learning efficiency far surpassing traditional data-driven methods. This work highlights the potential of memory-augmented, cooperative LLM agents to transform autonomous driving by enabling scalable, safe, and adaptable decision-making without extensive retraining.

Abstract

Large language models (LLMs) as autonomous agents offer a novel avenue for tackling real-world challenges through a knowledge-driven manner. These LLM-enhanced methodologies excel in generalization and interpretability. However, the complexity of driving tasks often necessitates the collaboration of multiple, heterogeneous agents, underscoring the need for such LLM-driven agents to engage in cooperative knowledge sharing and cognitive synergy. Despite the promise of LLMs, current applications predominantly center around single agent scenarios. To broaden the horizons of knowledge-driven strategies and bolster the generalization capabilities of autonomous agents, we propose the KoMA framework consisting of multi-agent interaction, multi-step planning, shared-memory, and ranking-based reflection modules to enhance multi-agents' decision-making in complex driving scenarios. Based on the framework's generated text descriptions of driving scenarios, the multi-agent interaction module enables LLM agents to analyze and infer the intentions of surrounding vehicles, akin to human cognition. The multi-step planning module enables LLM agents to analyze and obtain final action decisions layer by layer to ensure consistent goals for short-term action decisions. The shared memory module can accumulate collective experience to make superior decisions, and the ranking-based reflection module can evaluate and improve agent behavior with the aim of enhancing driving safety and efficiency. The KoMA framework not only enhances the robustness and adaptability of autonomous driving agents but also significantly elevates their generalization capabilities across diverse scenarios. Empirical results demonstrate the superiority of our approach over traditional methods, particularly in its ability to handle complex, unpredictable driving environments without extensive retraining.

KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

TL;DR

KoMA presents a knowledge-driven, multi-agent framework for autonomous driving that leverages LLMs to share knowledge, reason about surrounding vehicles, and plan across multiple steps. It couples five modules—Environment, Multi-agent Interaction, Multi-step Planning, Shared Memory, and Ranking-based Reflection—to enable cognitive synergy and memory-based generalization in complex traffic. Empirical results on highway on-ramp merging show KoMA achieving competitive performance and strong generalization across scenarios and LLMs, with learning efficiency far surpassing traditional data-driven methods. This work highlights the potential of memory-augmented, cooperative LLM agents to transform autonomous driving by enabling scalable, safe, and adaptable decision-making without extensive retraining.

Abstract

Large language models (LLMs) as autonomous agents offer a novel avenue for tackling real-world challenges through a knowledge-driven manner. These LLM-enhanced methodologies excel in generalization and interpretability. However, the complexity of driving tasks often necessitates the collaboration of multiple, heterogeneous agents, underscoring the need for such LLM-driven agents to engage in cooperative knowledge sharing and cognitive synergy. Despite the promise of LLMs, current applications predominantly center around single agent scenarios. To broaden the horizons of knowledge-driven strategies and bolster the generalization capabilities of autonomous agents, we propose the KoMA framework consisting of multi-agent interaction, multi-step planning, shared-memory, and ranking-based reflection modules to enhance multi-agents' decision-making in complex driving scenarios. Based on the framework's generated text descriptions of driving scenarios, the multi-agent interaction module enables LLM agents to analyze and infer the intentions of surrounding vehicles, akin to human cognition. The multi-step planning module enables LLM agents to analyze and obtain final action decisions layer by layer to ensure consistent goals for short-term action decisions. The shared memory module can accumulate collective experience to make superior decisions, and the ranking-based reflection module can evaluate and improve agent behavior with the aim of enhancing driving safety and efficiency. The KoMA framework not only enhances the robustness and adaptability of autonomous driving agents but also significantly elevates their generalization capabilities across diverse scenarios. Empirical results demonstrate the superiority of our approach over traditional methods, particularly in its ability to handle complex, unpredictable driving environments without extensive retraining.
Paper Structure (22 sections, 2 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 2 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: The knowledge-driven paradigm for single driving agent system and multiple driving agents system. Single knowledge-driven agent system including an interactive environment, a driver agent with recall, reasoning and reflection abilities, along with an independent memory module. Multiple knowledge-driven driving agent systems have an additional interaction module for communication and exchange among the agents.
  • Figure 2: Knowledge-driven autonomous driving framework $\mathsf{KoMA}$ that incorporates multiple agents empowered by LLMs. $\mathsf{KoMA}$ consists of five core modules: the environment module, the multi-step planning module, the interaction module, the ranking-based reflection module, and the shared memory module.
  • Figure 3: A case of multi-step planning module reasoning process. The multi-step planning module refers to a three-level multi-step reasoning of goal-plan-action, which analyzes and breaks down the scene target tasks step by step to ensure the consistency of the purpose of the decision-making before and after the action. Besides, it also refers to the four-step process of plan generation, plan evaluation, plan sorting, and plan selection when formulating a plan, to select the final plan that best fits the driving characteristics of the LLM, ensuring the feasibility and personalization of the plan.
  • Figure 4: Different agents retrieve relevant experiences of their respective scenarios from shared memory modules. The shared memory module is a vector database that accumulates driving experience fragments from all agents. These fragments are vectorized and then stored in the same database. When making decisions, the agent retrieves similar driving experiences from analogous scenarios using vector search, thereby aiding in the decision-making process.
  • Figure 5: The ranking-based reflection module evaluates decisions, identifies those with low scores, and corrects them. It then updates the shared memory module with these refined decisions, along with the high-scoring experiences.
  • ...and 7 more figures