Table of Contents
Fetching ...

Interactive LLM-assisted Curriculum Learning for Multi-Task Evolutionary Policy Search

Berfin Sakallioglu, Giorgia Nadizar, Eric Medvet

TL;DR

This work introduces an interactive LLM-assisted curriculum learning framework for multi-task evolutionary policy search, where an optimizer and a language model iteratively create and refine training cases based on real-time optimization feedback. The approach supports three feedback modalities—numeric scores, convergence plots, and behavior visualizations—aiming to produce genuine learning progressions rather than static task collections. Through a 2D navigation case with a symbolic, tree-based policy optimized by genetic programming, the study shows that online, feedback-informed curricula outperform static and random baselines, with progression-based (N+P) and multimodal (N+P+B) feedback achieving performance close to expert-designed curricula. The results demonstrate that LLMs can function as interactive curriculum designers for embodied AI systems, enabling scalable, automated curriculum design in evolutionary robotics while remaining largely agnostic to the specific optimizer or policy representation used.

Abstract

Multi-task policy search is a challenging problem because policies are required to generalize beyond training cases. Curriculum learning has proven to be effective in this setting, as it introduces complexity progressively. However, designing effective curricula is labor-intensive and requires extensive domain expertise. LLM-based curriculum generation has only recently emerged as a potential solution, but was limited to operate in static, offline modes without leveraging real-time feedback from the optimizer. Here we propose an interactive LLM-assisted framework for online curriculum generation, where the LLM adaptively designs training cases based on real-time feedback from the evolutionary optimization process. We investigate how different feedback modalities, ranging from numeric metrics alone to combinations with plots and behavior visualizations, influence the LLM ability to generate meaningful curricula. Through a 2D robot navigation case study, tackled with genetic programming as optimizer, we evaluate our approach against static LLM-generated curricula and expert-designed baselines. We show that interactive curriculum generation outperforms static approaches, with multimodal feedback incorporating both progression plots and behavior visualizations yielding performance competitive with expert-designed curricula. This work contributes to understanding how LLMs can serve as interactive curriculum designers for embodied AI systems, with potential extensions to broader evolutionary robotics applications.

Interactive LLM-assisted Curriculum Learning for Multi-Task Evolutionary Policy Search

TL;DR

This work introduces an interactive LLM-assisted curriculum learning framework for multi-task evolutionary policy search, where an optimizer and a language model iteratively create and refine training cases based on real-time optimization feedback. The approach supports three feedback modalities—numeric scores, convergence plots, and behavior visualizations—aiming to produce genuine learning progressions rather than static task collections. Through a 2D navigation case with a symbolic, tree-based policy optimized by genetic programming, the study shows that online, feedback-informed curricula outperform static and random baselines, with progression-based (N+P) and multimodal (N+P+B) feedback achieving performance close to expert-designed curricula. The results demonstrate that LLMs can function as interactive curriculum designers for embodied AI systems, enabling scalable, automated curriculum design in evolutionary robotics while remaining largely agnostic to the specific optimizer or policy representation used.

Abstract

Multi-task policy search is a challenging problem because policies are required to generalize beyond training cases. Curriculum learning has proven to be effective in this setting, as it introduces complexity progressively. However, designing effective curricula is labor-intensive and requires extensive domain expertise. LLM-based curriculum generation has only recently emerged as a potential solution, but was limited to operate in static, offline modes without leveraging real-time feedback from the optimizer. Here we propose an interactive LLM-assisted framework for online curriculum generation, where the LLM adaptively designs training cases based on real-time feedback from the evolutionary optimization process. We investigate how different feedback modalities, ranging from numeric metrics alone to combinations with plots and behavior visualizations, influence the LLM ability to generate meaningful curricula. Through a 2D robot navigation case study, tackled with genetic programming as optimizer, we evaluate our approach against static LLM-generated curricula and expert-designed baselines. We show that interactive curriculum generation outperforms static approaches, with multimodal feedback incorporating both progression plots and behavior visualizations yielding performance competitive with expert-designed curricula. This work contributes to understanding how LLMs can serve as interactive curriculum designers for embodied AI systems, with potential extensions to broader evolutionary robotics applications.
Paper Structure (55 sections, 1 equation, 6 figures)

This paper contains 55 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: The six test arenas used for assessing policies. The red region shows the robot starting position (which is placed at that center of the region), the green region shows the target position.
  • Figure 2: The expert-designed curriculum with the trajectories obtained with a policy optimized on this curriculum.
  • Figure 3: Progression (right plots) and final distribution (left plots) of the performance of the best policy $s^\star$ in the population measured on the train arenas (top plots) and on the test arenas (bottom plots), for the three interactive modalities and the three baselines. In the progression plots, the line corresponds to the median value across the 10.0 repetitions, the shaded area to the interquartile range. In the distribution plots, stars above the boxes show significant differences: e.g., an orange star over the red box indicates that N is significantly different than Random.
  • Figure 4: Progression (right plots) and final distribution (left plots) of the performance of the policy $s^\star$ with the best performance on the test arenas, for the three interactive modalities and the three baselines.
  • Figure 5: Final distribution of the performance of the best policy $s^\star$ in the population measured on the test arenas for each modality with progressive and batch administration. An arch over a pair of boxes denotes statistically significant difference.
  • ...and 1 more figures