R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models
Viet Dung Nguyen, Zhizhuo Yang, Christopher L. Buckley, Alexander Ororbia
TL;DR
The paper tackles sparse-reward, continuous-action robotic control in pixel-based POMDPs by extending active inference with R-AIF, which combines a recurrent world model (RSSM) with a dynamically learned prior (CRSPP) and an actor-critic planner. Actions are chosen to minimize the expected free energy $G_\tau(\pi)$ in imagined futures, incorporating instrumental rewards, epistemic curiosity via an information gain ensemble, and a self-revision mechanism that adaptively shapes goals. The key contributions are the CRSPP prior, the robust self-revision signaling, the dynamic EFE formulation, and the network-ensemble approach for information gain, all integrated into an off-policy training pipeline. Empirically, R-AIF converges faster and with higher final performance and stability than DreamerV3 and prior AIF baselines across Mountain Car, Meta-World, and robosuite pixel tasks, demonstrating improved robustness and data efficiency for high-dimensional, partially observable robotics. These results indicate that adaptive priors and principled information-seeking behavior can substantially enhance active inference in real-world-like control problems.
Abstract
Although research has produced promising results demonstrating the utility of active inference (AIF) in Markov decision processes (MDPs), there is relatively less work that builds AIF models in the context of environments and problems that take the form of partially observable Markov decision processes (POMDPs). In POMDP scenarios, the agent must infer the unobserved environmental state from raw sensory observations, e.g., pixels in an image. Additionally, less work exists in examining the most difficult form of POMDP-centered control: continuous action space POMDPs under sparse reward signals. In this work, we address issues facing the AIF modeling paradigm by introducing novel prior preference learning techniques and self-revision schedules to help the agent excel in sparse-reward, continuous action, goal-based robotic control POMDP environments. Empirically, we show that our agents offer improved performance over state-of-the-art models in terms of cumulative rewards, relative stability, and success rate. The code in support of this work can be found at https://github.com/NACLab/robust-active-inference.
