Continuously evolving rewards in an open-ended environment
Richard M. Bailey
TL;DR
The paper tackles how agents in open-ended environments can adapt their goals when rewards are not externally fixed. It introduces RULE (Reward Updating through Learning and Expectation), a mechanism for endogenously updating reward coefficients across generations during continuous RL, tested in a simplified ecosystem with Ents and primary producers. Results show populations can abandon detrimental learned behaviours, adjust to novel items like vitamins, and sustain survival under shifting conditions; the approach also reveals interactions with evolution and highlights limitations such as potential reward-hacking risks and the challenge of dormant or expanding reward components. Overall, RULE demonstrates a plausible pathway for endowing agents with dynamic, environment-responsive objectives, with implications for ecological, economic, and multi-agent systems.
Abstract
Unambiguous identification of the rewards driving behaviours of entities operating in complex open-ended real-world environments is difficult, partly because goals and associated behaviours emerge endogenously and are dynamically updated as environments change. Reproducing such dynamics in models would be useful in many domains, particularly where fixed reward functions limit the adaptive capabilities of agents. Simulation experiments described assess a candidate algorithm for the dynamic updating of rewards, RULE: Reward Updating through Learning and Expectation. The approach is tested in a simplified ecosystem-like setting where experiments challenge entities' survival, calling for significant behavioural change. The population of entities successfully demonstrate the abandonment of an initially rewarded but ultimately detrimental behaviour, amplification of beneficial behaviour, and appropriate responses to novel items added to their environment. These adjustment happen through endogenous modification of the entities' underlying reward function, during continuous learning, without external intervention.
