Table of Contents
Fetching ...

Improving Environment Novelty Quantification for Effective Unsupervised Environment Design

Jayden Teoh, Wenjun Li, Pradeep Varakantham

TL;DR

This work tackles generalization gaps in reinforcement learning caused by underspecified environment design by introducing CENIE, a domain-agnostic framework that quantifies environment novelty through the agent’s state-action coverage. Novelty is modeled with Gaussian Mixture Models to compute a level novelty score via the negative log-likelihood of the candidate level’s state-action samples, and is integrated with regret-based UED through ACCEL-CENIE and PLR-CENIE. Empirical results across Minigrid, BipedalWalker, and CarRacing show that incorporating novelty yields state-of-the-art zero-shot generalization and broader state-action coverage, often surpassing regret-only baselines and even achieving competitive performance with novelty-alone variants. These findings highlight the value of curriculum-aware novelty signals for driving robust generalization in complex, unseen environments, while maintaining scalability and domain independence. Future work may extend CENIE to direct environment generation, adaptive weighting between novelty and regret, and more expressive density models or latent representations to further improve curriculum design.

Abstract

Unsupervised Environment Design (UED) formalizes the problem of autocurricula through interactive training between a teacher agent and a student agent. The teacher generates new training environments with high learning potential, curating an adaptive curriculum that strengthens the student's ability to handle unseen scenarios. Existing UED methods mainly rely on regret, a metric that measures the difference between the agent's optimal and actual performance, to guide curriculum design. Regret-driven methods generate curricula that progressively increase environment complexity for the student but overlook environment novelty -- a critical element for enhancing an agent's generalizability. Measuring environment novelty is especially challenging due to the underspecified nature of environment parameters in UED, and existing approaches face significant limitations. To address this, this paper introduces the Coverage-based Evaluation of Novelty In Environment (CENIE) framework. CENIE proposes a scalable, domain-agnostic, and curriculum-aware approach to quantifying environment novelty by leveraging the student's state-action space coverage from previous curriculum experiences. We then propose an implementation of CENIE that models this coverage and measures environment novelty using Gaussian Mixture Models. By integrating both regret and novelty as complementary objectives for curriculum design, CENIE facilitates effective exploration across the state-action space while progressively increasing curriculum complexity. Empirical evaluations demonstrate that augmenting existing regret-based UED algorithms with CENIE achieves state-of-the-art performance across multiple benchmarks, underscoring the effectiveness of novelty-driven autocurricula for robust generalization.

Improving Environment Novelty Quantification for Effective Unsupervised Environment Design

TL;DR

This work tackles generalization gaps in reinforcement learning caused by underspecified environment design by introducing CENIE, a domain-agnostic framework that quantifies environment novelty through the agent’s state-action coverage. Novelty is modeled with Gaussian Mixture Models to compute a level novelty score via the negative log-likelihood of the candidate level’s state-action samples, and is integrated with regret-based UED through ACCEL-CENIE and PLR-CENIE. Empirical results across Minigrid, BipedalWalker, and CarRacing show that incorporating novelty yields state-of-the-art zero-shot generalization and broader state-action coverage, often surpassing regret-only baselines and even achieving competitive performance with novelty-alone variants. These findings highlight the value of curriculum-aware novelty signals for driving robust generalization in complex, unseen environments, while maintaining scalability and domain independence. Future work may extend CENIE to direct environment generation, adaptive weighting between novelty and regret, and more expressive density models or latent representations to further improve curriculum design.

Abstract

Unsupervised Environment Design (UED) formalizes the problem of autocurricula through interactive training between a teacher agent and a student agent. The teacher generates new training environments with high learning potential, curating an adaptive curriculum that strengthens the student's ability to handle unseen scenarios. Existing UED methods mainly rely on regret, a metric that measures the difference between the agent's optimal and actual performance, to guide curriculum design. Regret-driven methods generate curricula that progressively increase environment complexity for the student but overlook environment novelty -- a critical element for enhancing an agent's generalizability. Measuring environment novelty is especially challenging due to the underspecified nature of environment parameters in UED, and existing approaches face significant limitations. To address this, this paper introduces the Coverage-based Evaluation of Novelty In Environment (CENIE) framework. CENIE proposes a scalable, domain-agnostic, and curriculum-aware approach to quantifying environment novelty by leveraging the student's state-action space coverage from previous curriculum experiences. We then propose an implementation of CENIE that models this coverage and measures environment novelty using Gaussian Mixture Models. By integrating both regret and novelty as complementary objectives for curriculum design, CENIE facilitates effective exploration across the state-action space while progressively increasing curriculum complexity. Empirical evaluations demonstrate that augmenting existing regret-based UED algorithms with CENIE achieves state-of-the-art performance across multiple benchmarks, underscoring the effectiveness of novelty-driven autocurricula for robust generalization.

Paper Structure

This paper contains 29 sections, 11 equations, 17 figures, 7 tables, 4 algorithms.

Figures (17)

  • Figure 1: An overview of the CENIE framework. The teacher will utilise environment regret and novelty for curating student's curriculum. $\Gamma$ contains past experiences and $\tau$ is the recent trajectory.
  • Figure 2: Zero-shot transfer performance in eight human-designed test environments. The plots are based on the median and interquartile range of solved rates across 5 independent runs.
  • Figure 3: (a) Aggregate zero-shot transfer performance in Minigrid domain across 5 independent runs. (b) Zero-shot test performance of PLR$^\perp$, PLR-CENIE, ACCEL, and ACCEL-CENIE on PerfectMazeLarge across 5 independent runs.
  • Figure 4: Student's generalization performance on 6 BipedalWalker testing environments during training. Each curve is measured across 5 independent runs (mean and standard error).
  • Figure 5: Difficulty composition of levels replayed by ACCEL and ACCEL-CENIE during training.
  • ...and 12 more figures