Micro-Act: Mitigating Knowledge Conflict in LLM-based RAG via Actionable Self-Reasoning
Nan Huo, Jinyang Li, Bowen Qin, Ge Qu, Xiaolong Li, Xiaodong Li, Chenhao Ma, Reynold Cheng
TL;DR
Micro-Act introduces a hierarchical action space to mitigate knowledge conflicts in retrieval-augmented QA by automatically decomposing complex evidence into fine-grained, actionable steps. The method integrates a reasoning loop with navigational, functional, and bridging actions, guided by a complexity-aware stopping criterion to avoid infinite decomposition and adapt to different LLMs. Empirical results across five conflict datasets show substantial QA accuracy gains over state-of-the-art baselines, with robust performance in conflict-free scenarios and insights into complexity perception and over-rationalization. The approach offers practical improvements for real-world RAG systems, balancing accuracy, robustness, and cost while uncovering fine-grained conflicts beneath superficial contexts.
Abstract
Retrieval-Augmented Generation (RAG) systems commonly suffer from Knowledge Conflicts, where retrieved external knowledge contradicts the inherent, parametric knowledge of large language models (LLMs). It adversely affects performance on downstream tasks such as question answering (QA). Existing approaches often attempt to mitigate conflicts by directly comparing two knowledge sources in a side-by-side manner, but this can overwhelm LLMs with extraneous or lengthy contexts, ultimately hindering their ability to identify and mitigate inconsistencies. To address this issue, we propose Micro-Act a framework with a hierarchical action space that automatically perceives context complexity and adaptively decomposes each knowledge source into a sequence of fine-grained comparisons. These comparisons are represented as actionable steps, enabling reasoning beyond the superficial context. Through extensive experiments on five benchmark datasets, Micro-Act consistently achieves significant increase in QA accuracy over state-of-the-art baselines across all 5 datasets and 3 conflict types, especially in temporal and semantic types where all baselines fail significantly. More importantly, Micro-Act exhibits robust performance on non-conflict questions simultaneously, highlighting its practical value in real-world RAG applications.
