Table of Contents
Fetching ...

Human-Aligned Skill Discovery: Balancing Behaviour Exploration and Alignment

Maxence Hussonnois, Thommen George Karimpanal, Santu Rana

TL;DR

This work tackles the safety and utility gaps in unsupervised skill discovery by introducing Human-aligned Skill Discovery (HaSD), which jointly optimizes a diversity objective and a human-alignment objective. The approach combines Distance-Maximising Skill Discovery (DSD) with a reward model learned from human preferences (Bradley-Terry framework) to steer entire skill trajectories toward human values, while still promoting diverse, dynamic behaviors. It further extends to Configurable HaSD (α-HaSD), enabling a continuum of diversity-alignment trade-offs by conditioning skills on α. Evaluations in Nav2D and Safety Gymnasium show HaSD yields diverse, safer skills that improve downstream task performance and demonstrate robustness to varying human feedback budgets, with α-HaSD providing a practical mechanism to tailor behavior to application needs.

Abstract

Unsupervised skill discovery in Reinforcement Learning aims to mimic humans' ability to autonomously discover diverse behaviors. However, existing methods are often unconstrained, making it difficult to find useful skills, especially in complex environments, where discovered skills are frequently unsafe or impractical. We address this issue by proposing Human-aligned Skill Discovery (HaSD), a framework that incorporates human feedback to discover safer, more aligned skills. HaSD simultaneously optimises skill diversity and alignment with human values. This approach ensures that alignment is maintained throughout the skill discovery process, eliminating the inefficiencies associated with exploring unaligned skills. We demonstrate its effectiveness in both 2D navigation and SafetyGymnasium environments, showing that HaSD discovers diverse, human-aligned skills that are safe and useful for downstream tasks. Finally, we extend HaSD by learning a range of configurable skills with varying degrees of diversity alignment trade-offs that could be useful in practical scenarios.

Human-Aligned Skill Discovery: Balancing Behaviour Exploration and Alignment

TL;DR

This work tackles the safety and utility gaps in unsupervised skill discovery by introducing Human-aligned Skill Discovery (HaSD), which jointly optimizes a diversity objective and a human-alignment objective. The approach combines Distance-Maximising Skill Discovery (DSD) with a reward model learned from human preferences (Bradley-Terry framework) to steer entire skill trajectories toward human values, while still promoting diverse, dynamic behaviors. It further extends to Configurable HaSD (α-HaSD), enabling a continuum of diversity-alignment trade-offs by conditioning skills on α. Evaluations in Nav2D and Safety Gymnasium show HaSD yields diverse, safer skills that improve downstream task performance and demonstrate robustness to varying human feedback budgets, with α-HaSD providing a practical mechanism to tailor behavior to application needs.

Abstract

Unsupervised skill discovery in Reinforcement Learning aims to mimic humans' ability to autonomously discover diverse behaviors. However, existing methods are often unconstrained, making it difficult to find useful skills, especially in complex environments, where discovered skills are frequently unsafe or impractical. We address this issue by proposing Human-aligned Skill Discovery (HaSD), a framework that incorporates human feedback to discover safer, more aligned skills. HaSD simultaneously optimises skill diversity and alignment with human values. This approach ensures that alignment is maintained throughout the skill discovery process, eliminating the inefficiencies associated with exploring unaligned skills. We demonstrate its effectiveness in both 2D navigation and SafetyGymnasium environments, showing that HaSD discovers diverse, human-aligned skills that are safe and useful for downstream tasks. Finally, we extend HaSD by learning a range of configurable skills with varying degrees of diversity alignment trade-offs that could be useful in practical scenarios.

Paper Structure

This paper contains 49 sections, 15 equations, 21 figures, 5 tables, 1 algorithm.

Figures (21)

  • Figure 1: Without an alignment signal, discovering desirable skills in complex environments is like searching for a needle in a haystack, often leading to skills that achieve tasks in undesirable ways, such as carrying a glass of water awkwardly (red robots). Aligning skills during discovery ensures they meet human preferences (blue robots).
  • Figure 2: Illustration of the HaSD reward components. Skill Discovery rewards are computed using the Distance-Maximising Skill Discovery (DSD) objective and data collected from interaction with the environment. The reward encourages skills to be more dynamic and diverse. Then, we add a $r_{Ha}$ human-aligned reward learned with preference learning through data collected from interaction with the environment and human preferences. This reward encourages skills to align with human preferences.
  • Figure 3: Skill sets learned from all baselines in the 2D navigation environment. LSD covers a larger region of the environment than DIAYN, while HaSD avoids hazardous areas while maintaining good coverage in safe places. SMERL methods are well aligned, but the coverage is not optimal.
  • Figure 4: Comparing visually skill set obtained with $\alpha$-HaSD by changing $\alpha$ in the 2D navigation environment. When alpha is set to 0 the skill set is similar to LSD. The higher the $\alpha$, the lower the diversity and coverage is.
  • Figure 5: The approximated Pareto front shows that in the 2D navigation environment, LSD solutions achieve high coverage with low alignment on the left side. HaSD solutions are more to the right, offering high alignment while maintaining coverage. $\alpha$-HaSD covers more areas with diverse diversity-alignment trade-offs. SMERL methods attain high alignment but have lower coverage compared to HaSD.
  • ...and 16 more figures