Practice Makes Perfect: Planning to Learn Skill Parameter Policies

Nishanth Kumar; Tom Silver; Willie McClinton; Linfeng Zhao; Stephen Proulx; Tomás Lozano-Pérez; Leslie Pack Kaelbling; Jennifer Barry

Practice Makes Perfect: Planning to Learn Skill Parameter Policies

Nishanth Kumar, Tom Silver, Willie McClinton, Linfeng Zhao, Stephen Proulx, Tomás Lozano-Pérez, Leslie Pack Kaelbling, Jennifer Barry

TL;DR

This work tackles rapid skill specialization for long-horizon robotic tasks by planning to practice parameterized skills. It introduces Estimate-Extrapolate-Situate (EES), a lifelong-learning framework that selects which skills to practice by estimating current competence, extrapolating future improvement, and situating that improvement within the task distribution to maximize the overall success probability $J_{tasks}(\Pi)$. Competence is modeled with a Beta-Bernoulli time-series, extrapolated via a learned function $f_\phi$, and integrated into planning through a skeleton-based decomposition with AI planning operators; parameter policies are learned as energy-based models to efficiently map context $x$ to parameter samples. Experiments in simulation and on real Spot robots demonstrate that EES achieves superior sample efficiency and robustness to perception and control noise, enabling autonomous improvement on two long-horizon mobile-manipulation tasks after hours of practice with reset-free learning. The results highlight the value of planning-guided active practice and competence-aware planning for scalable, autonomous skill specialization in the real world. The work lays groundwork for integrating planning, lifelong learning, and TAMP ideas to advance autonomous robot decision making in dynamic environments.

Abstract

One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu

Practice Makes Perfect: Planning to Learn Skill Parameter Policies

TL;DR

. Competence is modeled with a Beta-Bernoulli time-series, extrapolated via a learned function

, and integrated into planning through a skeleton-based decomposition with AI planning operators; parameter policies are learned as energy-based models to efficiently map context

to parameter samples. Experiments in simulation and on real Spot robots demonstrate that EES achieves superior sample efficiency and robustness to perception and control noise, enabling autonomous improvement on two long-horizon mobile-manipulation tasks after hours of practice with reset-free learning. The results highlight the value of planning-guided active practice and competence-aware planning for scalable, autonomous skill specialization in the real world. The work lays groundwork for integrating planning, lifelong learning, and TAMP ideas to advance autonomous robot decision making in dynamic environments.

Abstract

Paper Structure (30 sections, 6 equations, 7 figures, 4 tables, 4 algorithms)

This paper contains 30 sections, 6 equations, 7 figures, 4 tables, 4 algorithms.

Introduction
Problem Setting
Modelling the World
Planning to Solve Tasks
Online Learning Paradigm
Planning to Learn
Selecting Skills to Practice
Estimating Skill Competence
Extrapolating Skill Competence
Situating Skill Competence
Explore Parameter Policies
Learning to Improve Parameter Policies
Experiments
Related Work
Exploration in Reinforcement Learning
...and 15 more sections

Figures (7)

Figure 1: Running example: Ball-Ring environment. The goal is to put the ball on the table. The robot should learn that (1) the ball cannot be placed directly because it will roll off the slanted table; (2) the ring can only be placed on the left side because the right side is smooth (shown in the top-right corner); (3) placing the ring on the table and then placing the ball inside the ring is the best way to accomplish the goal.
Figure 2: Pipeline overview. (1) During free time, the robot repeatedly selects skills to practice. Here, Place(ring, table, $\circ$) is selected because it maximizes $J_{\text{skill}}$ (Algorithm \ref{['alg:estimate-extrapolate-situate']}). (2) The robot plans to satisfy the initiation condition of the skill and then selects a continuous parameter to practice (Algorithm \ref{['alg:planning-to-practice']}). (3) The resulting success or failure of the skill is used to improve the parameter policy (Section \ref{['subsec:learning']}).
Figure 3: Simulation results. Percentage of evaluation tasks solved vs. number of online transitions collected for all approaches in all simulated environments. Solid lines represent means and shading represents standard error across 10 seeds. Note that all approaches used the same parameter priors and feature engineering, discussed in detail in Appendix Sections \ref{['appendix:experiments']} and \ref{['appendix:param-policy-details']}. We run additional ablation experiments on these choices in Appendix Section \ref{['appendix:additional-experiments']}.
Figure 4: Skill competence graphical model.
Figure 5: Fitting competence models with EM. For the competence models, solid lines are modes are dashed lines are variances.
...and 2 more figures

Theorems & Definitions (1)

Definition 1: Skill Competence

Practice Makes Perfect: Planning to Learn Skill Parameter Policies

TL;DR

Abstract

Practice Makes Perfect: Planning to Learn Skill Parameter Policies

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (1)