Dreaming in Code for Curriculum Learning in Open-Ended Worlds

Konstantinos Mitsides; Maxence Faldor; Antoine Cully

Dreaming in Code for Curriculum Learning in Open-Ended Worlds

Konstantinos Mitsides, Maxence Faldor, Antoine Cully

TL;DR

The paper addresses sustaining learning progress in open-ended worlds by letting foundation models synthesize executable environment code to form curricula. It introduces Dreaming in Code (DiCode), a closed-loop UED framework that conditions environment generation on the agent's current competence and archives parent-offspring relationships in an evolving programmatic space. Empirically, DiCode on Craftax yields a $\sim16\%$ improvement in mean return over the strongest baseline and achieves non-zero late-game task success where prior methods fail, with qualitative evidence of teacher-like curriculum shaping and closed-loop necessity. The results demonstrate that code-level environment design can effectively scaffold long-horizon skill acquisition in complex domains while maintaining physics-consistent worlds. Limitations include reliance on a fixed engine and LLM latency, pointing to future work on broader engines and faster generation for scalable open-ended learning.

Abstract

Open-ended learning frames intelligence as emerging from continual interaction with an ever-expanding space of environments. While recent advances have utilized foundation models to programmatically generate diverse environments, these approaches often focus on discovering isolated behaviors rather than orchestrating sustained progression. In complex open-ended worlds, the large combinatorial space of possible challenges makes it difficult for agents to discover sequences of experiences that remain consistently learnable. To address this, we propose Dreaming in Code (DiCode), a framework in which foundation models synthesize executable environment code to scaffold learning toward increasing competence. In DiCode, "dreaming" takes the form of materializing code-level variations of the world. We instantiate DiCode in Craftax, a challenging open-ended benchmark characterized by rich mechanics and long-horizon progression. Empirically, DiCode enables agents to acquire long-horizon skills, achieving a $16\%$ improvement in mean return over the strongest baseline and non-zero success on late-game combat tasks where prior methods fail. Our results suggest that code-level environment design provides a practical mechanism for curriculum control, enabling the construction of intermediate environments that bridge competence gaps in open-ended worlds. Project page and source code are available at https://konstantinosmitsides.github.io/dreaming-in-code and https://github.com/konstantinosmitsides/dreaming-in-code.

Dreaming in Code for Curriculum Learning in Open-Ended Worlds

TL;DR

improvement in mean return over the strongest baseline and achieves non-zero late-game task success where prior methods fail, with qualitative evidence of teacher-like curriculum shaping and closed-loop necessity. The results demonstrate that code-level environment design can effectively scaffold long-horizon skill acquisition in complex domains while maintaining physics-consistent worlds. Limitations include reliance on a fixed engine and LLM latency, pointing to future work on broader engines and faster generation for scalable open-ended learning.

Abstract

improvement in mean return over the strongest baseline and non-zero success on late-game combat tasks where prior methods fail. Our results suggest that code-level environment design provides a practical mechanism for curriculum control, enabling the construction of intermediate environments that bridge competence gaps in open-ended worlds. Project page and source code are available at https://konstantinosmitsides.github.io/dreaming-in-code and https://github.com/konstantinosmitsides/dreaming-in-code.

Paper Structure (50 sections, 9 equations, 7 figures, 7 tables)

This paper contains 50 sections, 9 equations, 7 figures, 7 tables.

Introduction
Background
Problem Setting
Unsupervised Environment Design
Prioritized Level Replay
Dreaming in Code
Environment Search Space
Generation Cycle
Archive
Selection
Description & Code
Compilation Check
Training
Asynchronous Generation
Experiments
...and 35 more sections

Figures (7)

Figure 1: Overview of the Dreaming in Code framework. The pipeline consists of two interleaved processes: Training (top) and the Generation Cycle (bottom). In the generation cycle, a parent level is selected from the Archive based on learnability. Conditioning the foundation model on the parent level and the agent's current competence, it synthesizes a new level description and subsequent executable Python code. Levels that pass a compilation check are added to the Training Batch, which mixes the target environment, newly generated levels, and archived levels sampled via PLR. Agent performance and new levels update the archive, closing the curriculum loop.
Figure 2: Performance on Craftax. Mean episode return on the held-out test set (1024 unseen procedurally generated worlds) throughout training. Shaded regions indicate the standard error across 5 seeds.
Figure 3: Achievement Breakdown. Final success rates on selected achievements, ordered by hierarchical depth (left to right). DiCode consistently outperforms all baselines across all evaluated achievements. The performance gap is particularly significant in two key areas: 1) on instrumental milestones (e.g. Iron sword, Iron armour) which are prerequisites for sustaining long-term progress, and 2) on late-stage objectives (e.g. Gnomish archer, Gnome warrior) where baseline performance effectively collapses to zero, rendering them intractable for prior methods. Error bars denote standard error across 5 seeds.
Figure 4: Visualization of the DiCode Curriculum (Iterations 15–100).(Top) A snapshot of the archive as a directed graph, where nodes represent generated levels. Node color indicates the target skill category (see legend), and node size is proportional to the agent’s current success rate (SR). (Callouts) Three representative levels (112, 287, 532) illustrate the global curriculum (summarized for brevity; see Appendix \ref{['app:case_studies']} for full details), demonstrating how the model ramps up complexity by extending prior concepts (Level $112 \rightarrow 287$) and targeting distinct late-game bottlenecks (Level 532). (Inset) The local curriculum is depicted through the lineage of Level 112. The diff-style comparison (red/green) reveals how the foundation model evolves a parent level (112) into a child level (143) by removing scaffolding and increasing complexity. (Bottom) The average SR of the agent across active training levels remains stable around $0.5$, indicating that the generator successfully maintains the agent in a zone of proximal development.
Figure 5: Final Achievement Success Rates. Aggregate success rates for DiCode versus baselines across all defined Craftax achievements. Results report the mean and standard error across 5 random seeds after $2 \times 10^9$ steps.
...and 2 more figures

Dreaming in Code for Curriculum Learning in Open-Ended Worlds

TL;DR

Abstract

Dreaming in Code for Curriculum Learning in Open-Ended Worlds

Authors

TL;DR

Abstract

Table of Contents

Figures (7)