Creating Hierarchical Dispositions of Needs in an Agent
Tofara Moyo
TL;DR
The paper addresses reward shaping in reinforcement learning by introducing hierarchical reward functions via a secondary reward-critic that outputs multiple scalars corresponding to abstraction levels. An ordering equation $r = R r_{1} + R$ links these signals to the base reward, and a three-level version $r=R(r_{1}(r_{2}r_{3}+r_{2})+r_{1})+R$ enforces a hierarchy of needs. Empirical evaluation on Pendulum-v1 shows faster convergence, improved stability, and higher final rewards compared with PPO, including beating prior state-of-the-art methods after code adaptations. The work highlights potential for scalable goal formation and transfer via hierarchical rewards and outlines graph-based extensions for richer reward dynamics.
Abstract
We present a novel method for learning hierarchical abstractions that prioritize competing objectives, leading to improved global expected rewards. Our approach employs a secondary rewarding agent with multiple scalar outputs, each associated with a distinct level of abstraction. The traditional agent then learns to maximize these outputs in a hierarchical manner, conditioning each level on the maximization of the preceding level. We derive an equation that orders these scalar values and the global reward by priority, inducing a hierarchy of needs that informs goal formation. Experimental results on the Pendulum v1 environment demonstrate superior performance compared to a baseline implementation.We achieved state of the art results.
