MoK-RAG: Mixture of Knowledge Paths Enhanced Retrieval-Augmented Generation for Embodied AI Environments
Zhengsheng Guo, Linwei Zheng, Xinyang Chen, Xuefeng Bai, Kehai Chen, Min Zhang
TL;DR
This work addresses the limitation of single-corpus Retrieval-Augmented Generation by introducing MoK-RAG, a Mixture of Knowledge Paths framework that partitions an LLM corpus into multiple specialized knowledge paths for concurrent retrieval. It extends the framework to Embodied AI 3D environment generation with MoK-RAG3D, incorporating a Splitting Module, a Constraint Module, and a dedicated Layout Module to produce cohesive, diverse scenes via a hierarchical knowledge tree and explicit spatial relations. Empirical results show reduced Reply Missing, improved asset selection and layout coherence, and competitive scene quality compared to HOLODECK, with automated and human evaluations validating effectiveness in generating varied 3D environments. The work demonstrates the practical value of multi-path knowledge retrieval for Embodied AI and provides a foundation for automated, scalable 3D scene generation and evaluation, albeit with current hardware testing limitations for real robots.
Abstract
While human cognition inherently retrieves information from diverse and specialized knowledge sources during decision-making processes, current Retrieval-Augmented Generation (RAG) systems typically operate through single-source knowledge retrieval, leading to a cognitive-algorithmic discrepancy. To bridge this gap, we introduce MoK-RAG, a novel multi-source RAG framework that implements a mixture of knowledge paths enhanced retrieval mechanism through functional partitioning of a large language model (LLM) corpus into distinct sections, enabling retrieval from multiple specialized knowledge paths. Applied to the generation of 3D simulated environments, our proposed MoK-RAG3D enhances this paradigm by partitioning 3D assets into distinct sections and organizing them based on a hierarchical knowledge tree structure. Different from previous methods that only use manual evaluation, we pioneered the introduction of automated evaluation methods for 3D scenes. Both automatic and human evaluations in our experiments demonstrate that MoK-RAG3D can assist Embodied AI agents in generating diverse scenes.
