Interactive LLM-assisted Curriculum Learning for Multi-Task Evolutionary Policy Search
Berfin Sakallioglu, Giorgia Nadizar, Eric Medvet
TL;DR
This work introduces an interactive LLM-assisted curriculum learning framework for multi-task evolutionary policy search, where an optimizer and a language model iteratively create and refine training cases based on real-time optimization feedback. The approach supports three feedback modalities—numeric scores, convergence plots, and behavior visualizations—aiming to produce genuine learning progressions rather than static task collections. Through a 2D navigation case with a symbolic, tree-based policy optimized by genetic programming, the study shows that online, feedback-informed curricula outperform static and random baselines, with progression-based (N+P) and multimodal (N+P+B) feedback achieving performance close to expert-designed curricula. The results demonstrate that LLMs can function as interactive curriculum designers for embodied AI systems, enabling scalable, automated curriculum design in evolutionary robotics while remaining largely agnostic to the specific optimizer or policy representation used.
Abstract
Multi-task policy search is a challenging problem because policies are required to generalize beyond training cases. Curriculum learning has proven to be effective in this setting, as it introduces complexity progressively. However, designing effective curricula is labor-intensive and requires extensive domain expertise. LLM-based curriculum generation has only recently emerged as a potential solution, but was limited to operate in static, offline modes without leveraging real-time feedback from the optimizer. Here we propose an interactive LLM-assisted framework for online curriculum generation, where the LLM adaptively designs training cases based on real-time feedback from the evolutionary optimization process. We investigate how different feedback modalities, ranging from numeric metrics alone to combinations with plots and behavior visualizations, influence the LLM ability to generate meaningful curricula. Through a 2D robot navigation case study, tackled with genetic programming as optimizer, we evaluate our approach against static LLM-generated curricula and expert-designed baselines. We show that interactive curriculum generation outperforms static approaches, with multimodal feedback incorporating both progression plots and behavior visualizations yielding performance competitive with expert-designed curricula. This work contributes to understanding how LLMs can serve as interactive curriculum designers for embodied AI systems, with potential extensions to broader evolutionary robotics applications.
