The Work Capacity of Channels with Memory: Maximum Extractable Work in Percept-Action Loops
Lukas J. Fiderer, Paul C. Barth, Isaac D. Smith, Hans J. Briegel
TL;DR
This work develops a thermodynamic-information framework for percept-action loops, modeling both agent and environment as finite-memory hidden Markov channels and introducing work capacity $C^{\mathrm{work}}$ as the maximal average extractable work rate. The authors derive a general expression for extractable work $W$ and assess the work capacity across channel classes, showing that in loops with feedback the conventional design principles—maximizing predictive power and forgetting past actions—can fail to be optimal. A key result is that, for unifilar product environments, efficient work extraction requires a balance between action entropy and percept predictability, while in more general settings maximal predictability may come at a thermodynamic cost. These findings reveal a fundamental tension between prediction and energy efficiency in active learning and adaptive systems, suggesting new energetic design principles beyond passive observation and opening avenues for future quantum extensions and goal-directed agents.
Abstract
Predicting future observations plays a central role in machine learning, biology, economics, and many other fields. It lies at the heart of organizational principles such as the variational free energy principle and has even been shown -- based on the second law of thermodynamics -- to be necessary for reaching the fundamental energetic limits of sequential information processing. While the usefulness of the predictive paradigm is undisputed, complex adaptive systems that interact with their environment are more than just predictive machines: they have the power to act upon their environment and cause change. In this work, we develop a framework to analyze the thermodynamics of information processing in percept-action loops -- a model of agent-environment interaction -- allowing us to investigate the thermodynamic implications of actions and percepts on equal footing. To this end, we introduce the concept of work capacity -- the maximum rate at which an agent can expect to extract work from its environment. Our results reveal that neither of two previously established design principles for work-efficient agents -- maximizing predictive power and forgetting past actions -- remains optimal in environments where actions have observable consequences. Instead, a trade-off emerges: work-efficient agents must balance prediction and forgetting, as remembering past actions can reduce the available free energy. This highlights a fundamental departure from the thermodynamics of passive observation, suggesting that prediction and energy efficiency may be at odds in active learning systems.
