Table of Contents
Fetching ...

Causally Aligned Curriculum Learning

Mingxuan Li, Junzhe Zhang, Elias Bareinboim

TL;DR

This work tackles misalignment in curriculum reinforcement learning caused by unobserved confounders by casting curriculum design in a structural causal modeling framework. It derives a graphical condition for causally aligned source tasks and introduces efficient procedures (FindMaxEdit and FindCausalCurriculum) to generate curricula that preserve invariant optimal decision rules and avoid harmful transfers. The approach is validated on pixel-based, confounded tasks (Colored Sokoban, Button Maze, Continuous Button Maze), showing that causally augmented curricula converge reliably and outperform non-causal counterparts. The contributions enable robust transfer from simplified source tasks to complex targets in settings where hidden confounders otherwise undermine learning efficiency and policy quality.

Abstract

A pervasive challenge in Reinforcement Learning (RL) is the "curse of dimensionality" which is the exponential growth in the state-action space when optimizing a high-dimensional target task. The framework of curriculum learning trains the agent in a curriculum composed of a sequence of related and more manageable source tasks. The expectation is that when some optimal decision rules are shared across source tasks and the target task, the agent could more quickly pick up the necessary skills to behave optimally in the environment, thus accelerating the learning process. However, this critical assumption of invariant optimal decision rules does not necessarily hold in many practical applications, specifically when the underlying environment contains unobserved confounders. This paper studies the problem of curriculum RL through causal lenses. We derive a sufficient graphical condition characterizing causally aligned source tasks, i.e., the invariance of optimal decision rules holds. We further develop an efficient algorithm to generate a causally aligned curriculum, provided with qualitative causal knowledge of the target task. Finally, we validate our proposed methodology through experiments in discrete and continuous confounded tasks with pixel observations.

Causally Aligned Curriculum Learning

TL;DR

This work tackles misalignment in curriculum reinforcement learning caused by unobserved confounders by casting curriculum design in a structural causal modeling framework. It derives a graphical condition for causally aligned source tasks and introduces efficient procedures (FindMaxEdit and FindCausalCurriculum) to generate curricula that preserve invariant optimal decision rules and avoid harmful transfers. The approach is validated on pixel-based, confounded tasks (Colored Sokoban, Button Maze, Continuous Button Maze), showing that causally augmented curricula converge reliably and outperform non-causal counterparts. The contributions enable robust transfer from simplified source tasks to complex targets in settings where hidden confounders otherwise undermine learning efficiency and policy quality.

Abstract

A pervasive challenge in Reinforcement Learning (RL) is the "curse of dimensionality" which is the exponential growth in the state-action space when optimizing a high-dimensional target task. The framework of curriculum learning trains the agent in a curriculum composed of a sequence of related and more manageable source tasks. The expectation is that when some optimal decision rules are shared across source tasks and the target task, the agent could more quickly pick up the necessary skills to behave optimally in the environment, thus accelerating the learning process. However, this critical assumption of invariant optimal decision rules does not necessarily hold in many practical applications, specifically when the underlying environment contains unobserved confounders. This paper studies the problem of curriculum RL through causal lenses. We derive a sufficient graphical condition characterizing causally aligned source tasks, i.e., the invariance of optimal decision rules holds. We further develop an efficient algorithm to generate a causally aligned curriculum, provided with qualitative causal knowledge of the target task. Finally, we validate our proposed methodology through experiments in discrete and continuous confounded tasks with pixel observations.

Paper Structure

This paper contains 21 sections, 7 theorems, 23 equations, 17 figures, 2 tables, 6 algorithms.

Key Result

Theorem 1

For a target task $\mathcal{T} = \langle \mathcal{M}, \Pi, \mathcal{R}\rangle$, let $\mathcal{T}^{(j)} = \langle \mathcal{M}^{(j)}, \Pi, \mathcal{R}, \Delta^{(j)}\rangle$ be a source task of $\mathcal{T}$ by modifying states $\Delta^{(j)} \subseteq \boldsymbol{V}$. If $\Delta^{(j)}$ is editable w.r. where $\pi^*, \pi^{(j)} \in \Pi$ are optimal policies in the target $\mathcal{T}$ and source $\math

Figures (17)

  • Figure 1: Examples of (\ref{['fig:harmful']}) full episode of a misaligned source task that intervenes in the box color, (\ref{['fig:helpful']}) full episode of an aligned source task that only changes the initial box location, and (\ref{['fig:curriculum']}) an aligned curriculum where none of the source tasks intervenes in the box's color.
  • Figure 2: The average performance of curriculum generators.
  • Figure 3: Causal diagram for (\ref{['fig:sokoban target']}) the target task $\mathcal{T}$; and (\ref{['fig:sokoban source']}) comparing domain discrepancies between the target task $\mathcal{T}$ and source tasks $\mathcal{T}^{(1)}$ and $\mathcal{T}^{(2)}$. (b) is (a) augmented by edit indicators.
  • Figure 4: Policy overwriting described in \ref{['exp:overwriting']}.
  • Figure 5: Target task performance of the agents at different training stages in Colored Sokoban (Row 1) and Button Maze (Row 2) using different curriculum generators (Columns). The horizontal green line shows the performance of the agent trained directly in the target. "original" refers to the unaugmented curriculum generator and "causal" refers to its causally augmented version.
  • ...and 12 more figures

Theorems & Definitions (24)

  • Definition 1: Policy Space
  • Definition 2: Target Task
  • Definition 3: Source Task
  • Example 1: Misaligned Source Task
  • Definition 4: Editable States
  • Theorem 1: Causally Aligned Source Task
  • Theorem 2
  • Definition 5: Curriculum
  • Definition 6: Causally Aligned Curriculum
  • Example 2: Overwriting in Curriculum Learning
  • ...and 14 more