Table of Contents
Fetching ...

From Sparse to Dense: Toddler-inspired Reward Transition in Goal-Oriented Reinforcement Learning

Junseok Park, Hyeonseo Yang, Min Whoo Lee, Won-Seok Choi, Minsu Lee, Byoung-Tak Zhang

TL;DR

This work introduces a Toddler-Inspired Sparse-to-Dense (S2D) reward transition for goal-oriented RL, leveraging PBRS to densify rewards without altering optimal policies. Through dynamic robotic arm and egocentric navigation tasks, S2D improves learning efficiency and generalization, and cross-density visualizations reveal smoother policy-loss landscapes with wider minima. The study reinterprets Tolman’s maze experiments to argue that early free exploration in sparse stages fosters robust initial representations that aid later dense-reward learning. Across extensive experiments and environments, S2D consistently outperforms purely sparse, purely dense, and dense-to-sparse curricula, and ablations identify an optimal transition window early in training. These findings offer a principled framework for adaptive reward shaping with potential applications in model-based RL and multi-agent settings.

Abstract

Reinforcement learning (RL) agents often face challenges in balancing exploration and exploitation, particularly in environments where sparse or dense rewards bias learning. Biological systems, such as human toddlers, naturally navigate this balance by transitioning from free exploration with sparse rewards to goal-directed behavior guided by increasingly dense rewards. Inspired by this natural progression, we investigate the Toddler-Inspired Reward Transition in goal-oriented RL tasks. Our study focuses on transitioning from sparse to potential-based dense (S2D) rewards while preserving optimal strategies. Through experiments on dynamic robotic arm manipulation and egocentric 3D navigation tasks, we demonstrate that effective S2D reward transitions significantly enhance learning performance and sample efficiency. Additionally, using a Cross-Density Visualizer, we show that S2D transitions smooth the policy loss landscape, resulting in wider minima that improve generalization in RL models. In addition, we reinterpret Tolman's maze experiments, underscoring the critical role of early free exploratory learning in the context of S2D rewards.

From Sparse to Dense: Toddler-inspired Reward Transition in Goal-Oriented Reinforcement Learning

TL;DR

This work introduces a Toddler-Inspired Sparse-to-Dense (S2D) reward transition for goal-oriented RL, leveraging PBRS to densify rewards without altering optimal policies. Through dynamic robotic arm and egocentric navigation tasks, S2D improves learning efficiency and generalization, and cross-density visualizations reveal smoother policy-loss landscapes with wider minima. The study reinterprets Tolman’s maze experiments to argue that early free exploration in sparse stages fosters robust initial representations that aid later dense-reward learning. Across extensive experiments and environments, S2D consistently outperforms purely sparse, purely dense, and dense-to-sparse curricula, and ablations identify an optimal transition window early in training. These findings offer a principled framework for adaptive reward shaping with potential applications in model-based RL and multi-agent settings.

Abstract

Reinforcement learning (RL) agents often face challenges in balancing exploration and exploitation, particularly in environments where sparse or dense rewards bias learning. Biological systems, such as human toddlers, naturally navigate this balance by transitioning from free exploration with sparse rewards to goal-directed behavior guided by increasingly dense rewards. Inspired by this natural progression, we investigate the Toddler-Inspired Reward Transition in goal-oriented RL tasks. Our study focuses on transitioning from sparse to potential-based dense (S2D) rewards while preserving optimal strategies. Through experiments on dynamic robotic arm manipulation and egocentric 3D navigation tasks, we demonstrate that effective S2D reward transitions significantly enhance learning performance and sample efficiency. Additionally, using a Cross-Density Visualizer, we show that S2D transitions smooth the policy loss landscape, resulting in wider minima that improve generalization in RL models. In addition, we reinterpret Tolman's maze experiments, underscoring the critical role of early free exploratory learning in the context of S2D rewards.

Paper Structure

This paper contains 90 sections, 11 equations, 25 figures, 5 tables, 2 algorithms.

Figures (25)

  • Figure 1: Analogy of agents’ trajectories to toddlers’ learning. (a) A toddler’s learning trajectory––free exploration of the environment reflects learning with sparse rewards, (b) goal-directed behavior emerges as the toddler focuses on specific objectives, representing dense rewards. Similarly, the arrow above illustrates the agent’s transition from sparse to potential-based dense rewards, drawing a parallel between the learning processes of toddlers and agents.
  • Figure 2: Summary of the baseline rewards.
  • Figure 3: Experimental environments. (a) ViZDoom environments. (b) Minecraft environments. (c) Additional environments: Modified UR5-Reacher, Cartpole-Reacher with randomly spawned goals, and the detailed description of LunarLander are provided in Appendix A.
  • Figure 4: The agent’s performance across different reward baselines in several goal-oriented tasks. (1-3) In LunarLander, the total reward gained from intrinsic incentives was well below zero, as indicated by the dashed line. For UR5, both intrinsic motivation and sparse reward settings resulted in near-zero performance, making it difficult to observe. (4), (5) The ViZDoom agent’s ability to generalize across different reward types.
  • Figure 5: Analysis of policy loss landscape after reward transition. The 3D visualization depicts the policy loss landscape following a reward transition, starting with either a sparse or dense reward.
  • ...and 20 more figures

Theorems & Definitions (2)

  • Definition 1: Curriculum
  • Definition 2: Toddler-inspired S2D-curriculum