Table of Contents
Fetching ...

Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design

Markus D. Solbach, John K. Tsotsos

TL;DR

The paper evaluates modern reinforcement learning approaches on a challenging 3D Same-Different visuospatial task, replicated from human psychophysics in a Unity environment. It finds that PPO and imitation-learning methods struggle to scale across large discrete action spaces and sparse rewards, while curriculum learning—especially when informed by human performance data—can drive robust learning up to 48 viewpoints. The resulting agents achieve high accuracy but adopt strategies that differ from human observers, favoring limited, information-rich viewpoints rather than distributed exploration. The study underscores the importance of structured learning and cognitive priors (e.g., attention and memory) for scaling RL to active-perception tasks with high dimensionality and sparse rewards, suggesting avenues for cognitively inspired RL architectures.

Abstract

Reinforcement Learning is a mature technology, often suggested as a potential route towards Artificial General Intelligence, with the ambitious goal of replicating the wide range of abilities found in natural and artificial intelligence, including the complexities of human cognition. While RL had shown successes in relatively constrained environments, such as the classic Atari games and specific continuous control problems, recent years have seen efforts to expand its applicability. This work investigates the potential of RL in demonstrating intelligent behaviour and its progress in addressing more complex and less structured problem domains. We present an investigation into the capacity of modern RL frameworks in addressing a seemingly straightforward 3D Same-Different visuospatial task. While initial applications of state-of-the-art methods, including PPO, behavioural cloning and imitation learning, revealed challenges in directly learning optimal strategies, the successful implementation of curriculum learning offers a promising avenue. Effective learning was achieved by strategically designing the lesson plan based on the findings of a real-world human experiment.

Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design

TL;DR

The paper evaluates modern reinforcement learning approaches on a challenging 3D Same-Different visuospatial task, replicated from human psychophysics in a Unity environment. It finds that PPO and imitation-learning methods struggle to scale across large discrete action spaces and sparse rewards, while curriculum learning—especially when informed by human performance data—can drive robust learning up to 48 viewpoints. The resulting agents achieve high accuracy but adopt strategies that differ from human observers, favoring limited, information-rich viewpoints rather than distributed exploration. The study underscores the importance of structured learning and cognitive priors (e.g., attention and memory) for scaling RL to active-perception tasks with high dimensionality and sparse rewards, suggesting avenues for cognitively inspired RL architectures.

Abstract

Reinforcement Learning is a mature technology, often suggested as a potential route towards Artificial General Intelligence, with the ambitious goal of replicating the wide range of abilities found in natural and artificial intelligence, including the complexities of human cognition. While RL had shown successes in relatively constrained environments, such as the classic Atari games and specific continuous control problems, recent years have seen efforts to expand its applicability. This work investigates the potential of RL in demonstrating intelligent behaviour and its progress in addressing more complex and less structured problem domains. We present an investigation into the capacity of modern RL frameworks in addressing a seemingly straightforward 3D Same-Different visuospatial task. While initial applications of state-of-the-art methods, including PPO, behavioural cloning and imitation learning, revealed challenges in directly learning optimal strategies, the successful implementation of curriculum learning offers a promising avenue. Effective learning was achieved by strategically designing the lesson plan based on the findings of a real-world human experiment.

Paper Structure

This paper contains 17 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: The stimulus used by shepard1971mental to assess the human cogntive ability of mental rotation of three-dimensional objects. Note how the stimulus is an image and depicts a 3D object.
  • Figure 2: Stimuli used in this work. a) TEOS objects (from Solbach et al., 2021) categorized by three complexity levels based on block count. b) Illustration of the three RO values. c) Common coordinate system for all objects to define RO.
  • Figure 3: A screenshot of the Unity reinforcement learning environment where the agent (magenta sphere) interacts with two central objects. Insets show the agent's view (bottom left) and action history (bottom right). The green start square and white environment boundary are for illustration only and not visible to the agent.
  • Figure 4: Illustration of the six top-down views (a-f) with different levels of action space discretization. The agent (magenta) can occupy any light green location and from there, can view either of the two white objects.
  • Figure 5: PPO agent training results: Average accuracy (blue) and viewpoints (green) across all environments.
  • ...and 6 more figures