Focused Skill Discovery: Learning to Control Specific State Variables while Minimizing Side Effects
Jonathan Colaço Carr, Qinyi Sun, Cameron Allen
TL;DR
The paper tackles the limitation of unsupervised skill discovery methods that overlook control over individual state variables, leading to inefficient exploration and potential side effects when goals are underspecified. It introduces focused skill discovery, a general method that modifies existing skill rewards to target specific state variables while penalizing side effects on non-target variables, compatible with VIC, DIAYN, and LSD. Empirical results across three environments show that focused skills achieve up to about three times greater state coverage, enable faster downstream learning, and automatically avoid undesirable changes under underspecified goals, outperforming or matching recent baselines like DUSDi. This approach promises safer, more efficient pretraining of hierarchical policies and extends to a broad class of skill-discovery algorithms; future work includes scaling to continuous domains and exploring theoretical guarantees about reward hacking and focus.
Abstract
Skills are essential for unlocking higher levels of problem solving. A common approach to discovering these skills is to learn ones that reliably reach different states, thus empowering the agent to control its environment. However, existing skill discovery algorithms often overlook the natural state variables present in many reinforcement learning problems, meaning that the discovered skills lack control of specific state variables. This can significantly hamper exploration efficiency, make skills more challenging to learn with, and lead to negative side effects in downstream tasks when the goal is under-specified. We introduce a general method that enables these skill discovery algorithms to learn focused skills -- skills that target and control specific state variables. Our approach improves state space coverage by a factor of three, unlocks new learning capabilities, and automatically avoids negative side effects in downstream tasks.
