Table of Contents
Fetching ...

Active Inference for Self-Organizing Multi-LLM Systems: A Bayesian Thermodynamic Approach to Adaptation

Rithvik Prakki

TL;DR

The paper presents an active-inference framework that sits above LLM-driven agents to dynamically adapt prompts and search strategies, addressing the rigidity of static prompts. By modeling the environment with three state factors and seven observation modalities and optimizing through variational free energy and expected free energy, the approach enables principled exploration of prompt-search configurations while accounting for thermodynamic costs. Empirical results show the agent learns structured environment dynamics, develops meaningful prompt-search observation mappings, and exhibits a shift from information-gathering to targeted testing, underscoring a robust exploration-exploitation balance. This thermodynamic, information-theoretic perspective extends active inference to high-dimensional, language-driven environments and suggests a path toward more autonomous, self-improving AI systems.

Abstract

This paper introduces a novel approach to creating adaptive language agents by integrating active inference with large language models (LLMs). While LLMs demonstrate remarkable capabilities, their reliance on static prompts limits adaptation to new information and changing environments. We address this by implementing an active inference framework that acts as a cognitive layer above an LLM-based agent, dynamically adjusting prompts and search strategies through principled information-seeking behavior. Our framework models the environment using three state factors (prompt, search, and information states) with seven observation modalities capturing quality metrics. By framing the agent's learning through the free energy principle, we enable systematic exploration of prompt combinations and search strategies. Experimental results demonstrate the effectiveness of this approach, with the agent developing accurate models of environment dynamics evidenced by emergent structure in observation matrices. Action selection patterns reveal sophisticated exploration-exploitation behavior, transitioning from initial information-gathering to targeted prompt testing. The integration of thermodynamic principles with language model capabilities provides a principled framework for creating robust, adaptable agents, extending active inference beyond traditional low-dimensional control problems to high-dimensional, language-driven environments.

Active Inference for Self-Organizing Multi-LLM Systems: A Bayesian Thermodynamic Approach to Adaptation

TL;DR

The paper presents an active-inference framework that sits above LLM-driven agents to dynamically adapt prompts and search strategies, addressing the rigidity of static prompts. By modeling the environment with three state factors and seven observation modalities and optimizing through variational free energy and expected free energy, the approach enables principled exploration of prompt-search configurations while accounting for thermodynamic costs. Empirical results show the agent learns structured environment dynamics, develops meaningful prompt-search observation mappings, and exhibits a shift from information-gathering to targeted testing, underscoring a robust exploration-exploitation balance. This thermodynamic, information-theoretic perspective extends active inference to high-dimensional, language-driven environments and suggests a path toward more autonomous, self-improving AI systems.

Abstract

This paper introduces a novel approach to creating adaptive language agents by integrating active inference with large language models (LLMs). While LLMs demonstrate remarkable capabilities, their reliance on static prompts limits adaptation to new information and changing environments. We address this by implementing an active inference framework that acts as a cognitive layer above an LLM-based agent, dynamically adjusting prompts and search strategies through principled information-seeking behavior. Our framework models the environment using three state factors (prompt, search, and information states) with seven observation modalities capturing quality metrics. By framing the agent's learning through the free energy principle, we enable systematic exploration of prompt combinations and search strategies. Experimental results demonstrate the effectiveness of this approach, with the agent developing accurate models of environment dynamics evidenced by emergent structure in observation matrices. Action selection patterns reveal sophisticated exploration-exploitation behavior, transitioning from initial information-gathering to targeted prompt testing. The integration of thermodynamic principles with language model capabilities provides a principled framework for creating robust, adaptable agents, extending active inference beyond traditional low-dimensional control problems to high-dimensional, language-driven environments.

Paper Structure

This paper contains 33 sections, 22 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: Visualization of the three main components of the observation model tensor. Left: Prompt quality observations $A_{[0,1,2]}$ mapping 33 prompt states to 11 quality levels for accuracy, relevance, and comprehensiveness. Middle: Search quality observations $A_{[3,4,5]}$ mapping 11 search states to 11 quality levels for information relevance, usefulness, and source quality. Right: Information state observations $A_{[6]}$ providing a direct mapping between 3 hidden information states and their corresponding observations. Darker colors indicate higher probability values, showing the structure of the likelihood mappings. Here the matrices are all uniform since the agent starts with no knowledge of the state-observation mappings.
  • Figure 2: Final learned observation mappings after environment interaction. The first three matrices on the top row show the relationships between prompt states and quality metrics. The final matrix on the top row and the first two matrices on the bottom row show the relationship between search states and search quality metrics. The final matrix on the bottom row shows the learned mapping between the information state factor and the information observation modality. The matrices effectively show, for each prompt and search term, what scores seem to be associated with them based on the observations. The final matrix structure results from a predominance of "detailed_info" observations. The emergence of structure from the initial uniform distributions (Figure \ref{['fig:a_matrices']}) demonstrates successful learning of environment dynamics.
  • Figure 3: Progression of Expected Free Energy (EFE) values for different policies across four time points. The evolution shows how the agent learned to distinguish between effective and ineffective action combinations. Lower EFE values (darker colors) indicate more preferred policies. The emergence of clear patterns demonstrates the agent's developing understanding of which actions are most valuable in different contexts.
  • Figure 4: Heatmap showing the frequency of action selection across prompt and search dimensions. Lighter colors indicate more frequently selected actions. The pattern shows the overall distribution of action selections, with certain prompt-search combinations being consistently preferred over others based on their effectiveness.
  • Figure 5: Time series of action selection throughout the experiment. Blue dots represent prompt actions (labeled with prompt IDs), while red dots represent search actions (labeled with search IDs). The progression shows a clear transition from search-dominated early phases to prompt-dominated later phases, demonstrating the agent's evolving strategy from exploration to exploitation.