Learning To Explore With Predictive World Model Via Self-Supervised Learning
Alana Santana, Paula P. Costa, Esther L. Colombini
TL;DR
In environments with scarce extrinsic rewards, the paper tackles exploration by introducing an intrinsically motivated agent built from a predictive world model and a policy network. The world model is modular, hierarchical, and BRIM-based, using attention to allocate computation and generate intrinsic rewards via prediction error, $r^{int}_{t} = \frac{\left \| h_{t}^{p} - h_{t-1}^{f} \right \|^{2}_{2}}{n}$. Trained with PPO on 18 Atari games, the approach achieves superior performance in many cases, demonstrating robust reactive and deliberative behaviors and faster accrual of extrinsic rewards, while highlighting some limitations in highly sparse tasks. These results suggest that integrating sparsity, modularity, independence, hierarchy, and attention into predictive world models can yield scalable intrinsic motivation for complex environments and potential extensions to robotics and more realistic settings.
Abstract
Autonomous artificial agents must be able to learn behaviors in complex environments without humans to design tasks and rewards. Designing these functions for each environment is not feasible, thus, motivating the development of intrinsic reward functions. In this paper, we propose using several cognitive elements that have been neglected for a long time to build an internal world model for an intrinsically motivated agent. Our agent performs satisfactory iterations with the environment, learning complex behaviors without needing previously designed reward functions. We used 18 Atari games to evaluate what cognitive skills emerge in games that require reactive and deliberative behaviors. Our results show superior performance compared to the state-of-the-art in many test cases with dense and sparse rewards.
