Table of Contents
Fetching ...

Atomic Information Flow: A Network Flow Model for Tool Attributions in RAG Systems

James Gao, Josh Zhou, Qi Sun, Ryan Huang, Steven Yoo

TL;DR

The paper presents Atomic Information Flow (AIF), a graph-based framework that attributes LLM-generated responses in multi-tool RAG systems to specific tool components by decomposing all outputs into minimal semantic atoms and tracking their flow. It introduces a three-stage methodology—atomic decomposition, atomic signal injection, and response-atom assignment—plus flow heuristics, and trains a small model (Gemma3-4B) to approximate a minimum-cut compression policy. Empirical results on HotPotQA and other QA benchmarks show that AIF signals enable substantial context compression with only modest losses in accuracy, bridging the gap between small and large models and enabling fine-grained attribution and trajectory debugging. The work lays a foundation for trajectory-level rewards and explainability in RAG stacks, with implications for efficiency and scalable tool orchestration in LLM systems.

Abstract

Many tool-based Retrieval Augmented Generation (RAG) systems lack precise mechanisms for tracing final responses back to specific tool components -- a critical gap as systems scale to complex multi-agent architectures. We present \textbf{Atomic Information Flow (AIF)}, a graph-based network flow model that decomposes tool outputs and LLM calls into atoms: indivisible, self-contained units of information. By modeling LLM orchestration as a directed flow of atoms from tool and LLM nodes to a response super-sink, AIF enables granular attribution metrics for AI explainability. Motivated by the max-flow min-cut theorem in network flow theory, we train a lightweight Gemma3 (4B parameter) language model as a context compressor to approximate the minimum cut of tool atoms using flow signals computed offline by AIF. We note that the base Gemma3-4B model struggles to identify critical information with \textbf{54.7\%} accuracy on HotpotQA, barely outperforming lexical baselines (BM25). However, post-training on AIF signals boosts accuracy to \textbf{82.71\%} (+28.01 points) while achieving \textbf{87.52\%} (+1.85\%) context token compression -- bridging the gap with the Gemma3-27B variant, a model nearly $7\times$ larger.

Atomic Information Flow: A Network Flow Model for Tool Attributions in RAG Systems

TL;DR

The paper presents Atomic Information Flow (AIF), a graph-based framework that attributes LLM-generated responses in multi-tool RAG systems to specific tool components by decomposing all outputs into minimal semantic atoms and tracking their flow. It introduces a three-stage methodology—atomic decomposition, atomic signal injection, and response-atom assignment—plus flow heuristics, and trains a small model (Gemma3-4B) to approximate a minimum-cut compression policy. Empirical results on HotPotQA and other QA benchmarks show that AIF signals enable substantial context compression with only modest losses in accuracy, bridging the gap between small and large models and enabling fine-grained attribution and trajectory debugging. The work lays a foundation for trajectory-level rewards and explainability in RAG stacks, with implications for efficiency and scalable tool orchestration in LLM systems.

Abstract

Many tool-based Retrieval Augmented Generation (RAG) systems lack precise mechanisms for tracing final responses back to specific tool components -- a critical gap as systems scale to complex multi-agent architectures. We present \textbf{Atomic Information Flow (AIF)}, a graph-based network flow model that decomposes tool outputs and LLM calls into atoms: indivisible, self-contained units of information. By modeling LLM orchestration as a directed flow of atoms from tool and LLM nodes to a response super-sink, AIF enables granular attribution metrics for AI explainability. Motivated by the max-flow min-cut theorem in network flow theory, we train a lightweight Gemma3 (4B parameter) language model as a context compressor to approximate the minimum cut of tool atoms using flow signals computed offline by AIF. We note that the base Gemma3-4B model struggles to identify critical information with \textbf{54.7\%} accuracy on HotpotQA, barely outperforming lexical baselines (BM25). However, post-training on AIF signals boosts accuracy to \textbf{82.71\%} (+28.01 points) while achieving \textbf{87.52\%} (+1.85\%) context token compression -- bridging the gap with the Gemma3-27B variant, a model nearly larger.
Paper Structure (28 sections, 13 equations, 5 figures, 3 tables, 3 algorithms)

This paper contains 28 sections, 13 equations, 5 figures, 3 tables, 3 algorithms.

Figures (5)

  • Figure 1: Minimum Cut Signals Derived from AIF significantly outperform the base model and lexical baselines on HotPotQA, bridging the gap against much larger model architectures
  • Figure 2: The AIF model. Dots of the same color denote atoms that flow through each LLM "gate". For the scope of this paper, we focus on the Generation component and leave the Retrieval flow edges for future work. See section \ref{['sec:control-plane-limitation']} for more details. Details on decomposition and assignment algorithms can be found at Alg \ref{['alg:atomDecomposition']} and Alg \ref{['alg:responseAtomAssignment']}
  • Figure 3: Decomposition and relevance labeling for a HotpotQA tool passage. Full details in Appendix \ref{['sec:full-atom-decomposition']}.
  • Figure 4: Response Assignment Example from HotpotQA. Assignment field matches a corresponding Index from Appendix \ref{['sec:full-atom-decomposition']}
  • Figure 5: AUROC for LLM Assigned Relevance Labels Against Correct/Incorrect Responses on Musique Dataset

Theorems & Definitions (16)

  • Definition 2.1: Graph
  • Remark 2.2
  • Definition 2.3: Atom
  • Definition 2.4: Source/Supply Node
  • Definition 2.5: Super-Source
  • Definition 2.6: Super-Sink
  • Definition 2.7: Node Typing
  • Definition 2.8: Flow with Supply
  • Remark 2.9: Active Nodes & Steering
  • Remark 2.10: Multicommodity Decomposition
  • ...and 6 more