Training-Free Agentic AI: Probabilistic Control and Coordination in Multi-Agent LLM Systems

Mohammad Parsa Hosseini; Ankit Shah; Saiyra Qureshi; Alex Huang; Connie Miao; Wei Wei

Training-Free Agentic AI: Probabilistic Control and Coordination in Multi-Agent LLM Systems

Mohammad Parsa Hosseini, Ankit Shah, Saiyra Qureshi, Alex Huang, Connie Miao, Wei Wei

Abstract

Multi-agent large language model (LLM) systems enable complex, long-horizon reasoning by composing specialized agents, but practical deployment remains hindered by inefficient routing, noisy feedback, and high interaction cost. We introduce REDEREF, a lightweight and training-free controller for multi-agent LLM collaboration that improves routing efficiency during recursive delegation. REDEREF integrates (i) belief-guided delegation via Thompson sampling to prioritize agents with historically positive marginal contributions, (ii) reflection-driven re-routing using a calibrated LLM or programmatic judge, (iii) evidence-based selection rather than output averaging, and (iv) memory-aware priors to reduce cold-start inefficiency. Across multi-agent split-knowledge tasks, we show that while recursive retry alone saturates task success, belief-guided routing reduces token usage by 28%, agent calls by 17%, and time-to-success by 19% compared to random recursive delegation, and adapts gracefully under agent or judge degradation. These results demonstrate that simple, interpretable probabilistic control can meaningfully improve the efficiency and robustness of multi-agent LLM systems without training or fine-tuning.

Training-Free Agentic AI: Probabilistic Control and Coordination in Multi-Agent LLM Systems

Abstract

Paper Structure (64 sections, 4 theorems, 9 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 64 sections, 4 theorems, 9 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Contributions.
Related Work
Reasoning, reflection, and search.
Orchestration, ensembles, and routing.
Learning-based coordination.
Benchmarks and positioning.
The REDEREF Framework
Interpretation of Agent Beliefs.
Belief-Guided Delegation via Thompson Sampling (Core Policy)
Self-Reflection and Judging
Text-Appropriate Aggregation
Recursive Re-Routing
Design Rationale
Memory-Aware Priors and Cold-Start Mitigation
...and 49 more sections

Key Result

Theorem 3.1

Consider $N$ agents with true competences $\theta_1,\dots,\theta_N$ and a judge with class-conditional error rates $(\varepsilon_{\mathrm{FP}},\varepsilon_{\mathrm{FN}})$ satisfying $\delta=1-\varepsilon_{\mathrm{FP}}-\varepsilon_{\mathrm{FN}}>0$. Then the Bayesian regret of Thompson sampling over $ where $\widetilde{R}(T)$ is the regret of Thompson sampling on an equivalent noiseless problem with

Figures (5)

Figure 1: System architecture of REDEREF. Queries pass through belief-guided delegation, agent execution, and judge evaluation. Upon success (top path), the posterior is updated and stored in memory before producing the final answer. Upon failure (bottom path), the query is refined and re-routed, with candidates aggregated using evidence-based selection.
Figure 2: Evaluation framework. Research questions, testable hypotheses, and associated metrics. Each dimension targets a distinct property of the routing mechanism.
Figure 3: Performance comparison across delegation strategies. (a) Task success rates (mean $\pm$ 95% CI) are saturated by recursive retry. (b) Output quality at convergence. REDEREF primarily improves efficiency (fewer calls/tokens) while maintaining comparable success. Error bars represent 95% confidence intervals.
Figure 4: Collaboration dynamics and specialization. (a) Average quality gain versus number of contributing agents, showing peak composition efficiency at 3--5 agents. (b) Decline in rounds required to select the Electrical Engineering expert across task sequence, demonstrating specialization via posterior concentration.
Figure 5: Adaptability under agent impairment. Belief score trajectories for the Biology agent under normal versus systematically impaired conditions. The system rapidly detects degradation and down-weights the compromised agent, demonstrating real-time adaptability.

Theorems & Definitions (4)

Theorem 3.1: Regret under noisy judge feedback
Corollary 3.2
Lemma 1.1: Order preservation
Lemma 1.2: Gap contraction

Training-Free Agentic AI: Probabilistic Control and Coordination in Multi-Agent LLM Systems

Abstract

Training-Free Agentic AI: Probabilistic Control and Coordination in Multi-Agent LLM Systems

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)