TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

Geonwoo Cho; Jaegyun Im; Jihwan Lee; Hojun Yi; Sejin Kim; Sundong Kim

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

Geonwoo Cho, Jaegyun Im, Jihwan Lee, Hojun Yi, Sejin Kim, Sundong Kim

TL;DR

TRACED introduces a transition-aware regret approximation by adding a transition-prediction loss to traditional regret proxies and pairs it with a lightweight Co-Learnability metric to quantify cross-task transfer. This yields a unified Task Priority for environment design within the UED framework, guiding task generation and replay. Empirically, TRACED accelerates curriculum ramp-up and improves zero-shot generalization on MiniGrid and BipedalWalker, with ablations confirming the complementary roles of ATPL and Co-Learnability. The approach demonstrates faster, sample-efficient curricula and scalable performance to large, complex environments, offering a practical pathway for robust RL generalization.

Abstract

Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition-prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called Co-Learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED produces curricula that improve zero-shot generalization over strong baselines across multiple benchmarks. Ablation studies confirm that the transition-prediction error drives rapid complexity ramp-up and that Co-Learnability delivers additional gains when paired with the transition-prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED. Project Page: https://geonwoo.me/traced/

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

TL;DR

Abstract

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (23)

Theorems & Definitions (4)