Policy Compatible Skill Incremental Learning via Lazy Learning Interface
Daehee Lee, Dongsu Lee, TaeYoon Kwack, Wonje Choi, Honguk Woo
TL;DR
SIL-C tackles the core problem of maintaining compatibility between evolving skill libraries and downstream hierarchical policies in Skill Incremental Learning. It introduces a bilateral lazy learning interface that aligns the subtask space of high-level policies with the skill space of low-level decoders by matching trajectory distributions, enabling forward and backward compatibility without re-training. The approach uses append-only prototype memories and a two-stage instance-based matching process (validation followed by hooking) to map subtasks to executable skills at inference time, improving sample efficiency and modularity across diverse SIL scenarios. Empirical results in Franka Kitchen and Meta-World demonstrate superior skill-policy compatibility (higher AUC) and robust performance under noise and limited supervision, highlighting SIL-C’s potential for scalable, lifelong robotic learning. Overall, SIL-C enables true compositional learning where new skills enhance existing policies without full policy re-training, supporting safer, more scalable embodied agents.
Abstract
Skill Incremental Learning (SIL) is the process by which an embodied agent expands and refines its skill set over time by leveraging experience gained through interaction with its environment or by the integration of additional data. SIL facilitates efficient acquisition of hierarchical policies grounded in reusable skills for downstream tasks. However, as the skill repertoire evolves, it can disrupt compatibility with existing skill-based policies, limiting their reusability and generalization. In this work, we propose SIL-C, a novel framework that ensures skill-policy compatibility, allowing improvements in incrementally learned skills to enhance the performance of downstream policies without requiring policy re-training or structural adaptation. SIL-C employs a bilateral lazy learning-based mapping technique to dynamically align the subtask space referenced by policies with the skill space decoded into agent behaviors. This enables each subtask, derived from the policy's decomposition of a complex task, to be executed by selecting an appropriate skill based on trajectory distribution similarity. We evaluate SIL-C across diverse SIL scenarios and demonstrate that it maintains compatibility between evolving skills and downstream policies while ensuring efficiency throughout the learning process.
