Dreaming in Code for Curriculum Learning in Open-Ended Worlds
Konstantinos Mitsides, Maxence Faldor, Antoine Cully
TL;DR
The paper addresses sustaining learning progress in open-ended worlds by letting foundation models synthesize executable environment code to form curricula. It introduces Dreaming in Code (DiCode), a closed-loop UED framework that conditions environment generation on the agent's current competence and archives parent-offspring relationships in an evolving programmatic space. Empirically, DiCode on Craftax yields a $\sim16\%$ improvement in mean return over the strongest baseline and achieves non-zero late-game task success where prior methods fail, with qualitative evidence of teacher-like curriculum shaping and closed-loop necessity. The results demonstrate that code-level environment design can effectively scaffold long-horizon skill acquisition in complex domains while maintaining physics-consistent worlds. Limitations include reliance on a fixed engine and LLM latency, pointing to future work on broader engines and faster generation for scalable open-ended learning.
Abstract
Open-ended learning frames intelligence as emerging from continual interaction with an ever-expanding space of environments. While recent advances have utilized foundation models to programmatically generate diverse environments, these approaches often focus on discovering isolated behaviors rather than orchestrating sustained progression. In complex open-ended worlds, the large combinatorial space of possible challenges makes it difficult for agents to discover sequences of experiences that remain consistently learnable. To address this, we propose Dreaming in Code (DiCode), a framework in which foundation models synthesize executable environment code to scaffold learning toward increasing competence. In DiCode, "dreaming" takes the form of materializing code-level variations of the world. We instantiate DiCode in Craftax, a challenging open-ended benchmark characterized by rich mechanics and long-horizon progression. Empirically, DiCode enables agents to acquire long-horizon skills, achieving a $16\%$ improvement in mean return over the strongest baseline and non-zero success on late-game combat tasks where prior methods fail. Our results suggest that code-level environment design provides a practical mechanism for curriculum control, enabling the construction of intermediate environments that bridge competence gaps in open-ended worlds. Project page and source code are available at https://konstantinosmitsides.github.io/dreaming-in-code and https://github.com/konstantinosmitsides/dreaming-in-code.
