Table of Contents
Fetching ...

Parental Guidance: Efficient Lifelong Learning through Evolutionary Distillation

Octi Zhang, Quanquan Peng, Rosario Scalise, Bryon Boots

TL;DR

This paper addresses the challenge of generalization for robotic agents across diverse environments while sustaining continual learning. It introduces Parental Guidance, an evolution-inspired framework that distributes the learning process and merges imitation learning with reinforcement learning to inherit and refine behaviors across generations. A central DAG-based orchestrator coordinates distributed training, while offspring undergo behavioral distillation via DAgger followed by PPO-based RL refinement, enabling IL-to-RL transitions. Preliminary experiments show improved exploration efficiency and open-ended learning in a multi-terrain setting, suggesting a scalable path to lifelong adaptation without manual reward shaping.

Abstract

Developing robotic agents that can perform well in diverse environments while showing a variety of behaviors is a key challenge in AI and robotics. Traditional reinforcement learning (RL) methods often create agents that specialize in narrow tasks, limiting their adaptability and diversity. To overcome this, we propose a preliminary, evolution-inspired framework that includes a reproduction module, similar to natural species reproduction, balancing diversity and specialization. By integrating RL, imitation learning (IL), and a coevolutionary agent-terrain curriculum, our system evolves agents continuously through complex tasks. This approach promotes adaptability, inheritance of useful traits, and continual learning. Agents not only refine inherited skills but also surpass their predecessors. Our initial experiments show that this method improves exploration efficiency and supports open-ended learning, offering a scalable solution where sparse reward coupled with diverse terrain environments induces a multi-task setting.

Parental Guidance: Efficient Lifelong Learning through Evolutionary Distillation

TL;DR

This paper addresses the challenge of generalization for robotic agents across diverse environments while sustaining continual learning. It introduces Parental Guidance, an evolution-inspired framework that distributes the learning process and merges imitation learning with reinforcement learning to inherit and refine behaviors across generations. A central DAG-based orchestrator coordinates distributed training, while offspring undergo behavioral distillation via DAgger followed by PPO-based RL refinement, enabling IL-to-RL transitions. Preliminary experiments show improved exploration efficiency and open-ended learning in a multi-terrain setting, suggesting a scalable path to lifelong adaptation without manual reward shaping.

Abstract

Developing robotic agents that can perform well in diverse environments while showing a variety of behaviors is a key challenge in AI and robotics. Traditional reinforcement learning (RL) methods often create agents that specialize in narrow tasks, limiting their adaptability and diversity. To overcome this, we propose a preliminary, evolution-inspired framework that includes a reproduction module, similar to natural species reproduction, balancing diversity and specialization. By integrating RL, imitation learning (IL), and a coevolutionary agent-terrain curriculum, our system evolves agents continuously through complex tasks. This approach promotes adaptability, inheritance of useful traits, and continual learning. Agents not only refine inherited skills but also surpass their predecessors. Our initial experiments show that this method improves exploration efficiency and supports open-ended learning, offering a scalable solution where sparse reward coupled with diverse terrain environments induces a multi-task setting.

Paper Structure

This paper contains 10 sections, 4 equations, 12 figures, 1 algorithm.

Figures (12)

  • Figure 1: Left: Setup of Holistic Evolutionary Framework, an evolution scheduler that maintains a phylogenetic tree (a directed acyclic graph(DAG)). Each training cycle can be modularized and dispatched as a standalone process to an arbitrarily scalable number of compute nodes. Each node performs BC and RL and attaches the new child (the future parent) back to the phylogenetic tree. Right: Comparison of effect of BC-RL process. Each parent is a specialist in its own niche terrain and performs worse in other terrains. Each green dot represents treats successfully fetched by the agent. We let the agent run until the distance increment stops, indicating the agent skill has saturated. Distillation enables agents to do almost as well as the parent in both terrains, but BC-RL is able to exceed the performance of both parents on the union of tasks.
  • Figure 3: Comparison relationship between BC-to-RL transitions (same data from figure 2) and their performance. Again, the gradient from black to white ranks transitions from best to worst performance. The purple curve represents pure BC, and the blue curve represents pure RL.
  • Figure :
  • Figure :
  • Figure :
  • ...and 7 more figures