Table of Contents
Fetching ...

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

Yinhuai Wang, Qihan Zhao, Runyi Yu, Hok Wai Tsui, Ailing Zeng, Jing Lin, Zhengyi Luo, Jiwen Yu, Xiu Li, Qifeng Chen, Jian Zhang, Lei Zhang, Ping Tan

TL;DR

SkillMimic enables physically simulated humanoids to learn multiple basketball interaction skills from HOI demonstrations without skill-specific rewards. It introduces a unified HOI imitation reward and a Contact Graph to capture precise interactions, training a single Interaction Skill policy that can be switched and reused by a High-Level Controller to perform long-horizon tasks. The BallPlay-V and BallPlay-M datasets provide HOI data for diverse skills, and results show efficient, scalable learning improvements over baselines, with data scale enhancing generalization. This work advances scalable, generalizable HOI skill learning with potential for real-world humanoid basketball applications.

Abstract

Traditional reinforcement learning methods for human-object interaction (HOI) rely on labor-intensive, manually designed skill rewards that do not generalize well across different interactions. We introduce SkillMimic, a unified data-driven framework that fundamentally changes how agents learn interaction skills by eliminating the need for skill-specific rewards. Our key insight is that a unified HOI imitation reward can effectively capture the essence of diverse interaction patterns from HOI datasets. This enables SkillMimic to learn a single policy that not only masters multiple interaction skills but also facilitates skill transitions, with both diversity and generalization improving as the HOI dataset grows. For evaluation, we collect and introduce two basketball datasets containing approximately 35 minutes of diverse basketball skills. Extensive experiments show that SkillMimic successfully masters a wide range of basketball skills including stylistic variations in dribbling, layup, and shooting. Moreover, these learned skills can be effectively composed by a high-level controller to accomplish complex and long-horizon tasks such as consecutive scoring, opening new possibilities for scalable and generalizable interaction skill learning. Project page: https://ingrid789.github.io/SkillMimic/

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

TL;DR

SkillMimic enables physically simulated humanoids to learn multiple basketball interaction skills from HOI demonstrations without skill-specific rewards. It introduces a unified HOI imitation reward and a Contact Graph to capture precise interactions, training a single Interaction Skill policy that can be switched and reused by a High-Level Controller to perform long-horizon tasks. The BallPlay-V and BallPlay-M datasets provide HOI data for diverse skills, and results show efficient, scalable learning improvements over baselines, with data scale enhancing generalization. This work advances scalable, generalizable HOI skill learning with potential for real-world humanoid basketball applications.

Abstract

Traditional reinforcement learning methods for human-object interaction (HOI) rely on labor-intensive, manually designed skill rewards that do not generalize well across different interactions. We introduce SkillMimic, a unified data-driven framework that fundamentally changes how agents learn interaction skills by eliminating the need for skill-specific rewards. Our key insight is that a unified HOI imitation reward can effectively capture the essence of diverse interaction patterns from HOI datasets. This enables SkillMimic to learn a single policy that not only masters multiple interaction skills but also facilitates skill transitions, with both diversity and generalization improving as the HOI dataset grows. For evaluation, we collect and introduce two basketball datasets containing approximately 35 minutes of diverse basketball skills. Extensive experiments show that SkillMimic successfully masters a wide range of basketball skills including stylistic variations in dribbling, layup, and shooting. Moreover, these learned skills can be effectively composed by a high-level controller to accomplish complex and long-horizon tasks such as consecutive scoring, opening new possibilities for scalable and generalizable interaction skill learning. Project page: https://ingrid789.github.io/SkillMimic/
Paper Structure (46 sections, 19 equations, 13 figures, 9 tables)

This paper contains 46 sections, 19 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: Concept of SkillMimic. We define an interaction skill as a set of Human-Object Interaction (HOI) state transitions that align with the intended skill semantics. These state transitions can be derived from captured HOI motion clips. If a simulated humanoid can manipulate objects such that the resulting HOI state transitions closely match those of the reference, we consider the humanoid to have successfully learned the interaction skill.
  • Figure 2: Our system consists of three parts. (a) First, we capture real-world basketball skills to create a large Human-Object Interaction (HOI) motion dataset. (b) Second, we train an Interaction Skill (IS) policy to learn interaction skills by imitating the corresponding HOI data through reinforcement learning. Specifically, the IS policy takes as input the HOI state $\boldsymbol{s}_{t}$ and skill label $\boldsymbol{c}_{j}$ and predicts the action $\boldsymbol{a}_{t}$. The new state $\boldsymbol{s}_{t+1}$ is calculated by the simulator. A unified HOI imitation reward is designed to imitate diverse HOI state transitions. (c) The third part involves training a High-Level Controller (HLC) to reuse the learned interaction skills for complex tasks. The HLC takes as input $\boldsymbol{s}_{t}$ and extra task observations $\boldsymbol{h}_{t}$, e.g., the basket position, and predicts the skill label $\boldsymbol{c}_{t}$ to drive a pre-trained IS policy.
  • Figure 3: We propose the Contact Graph (CG) to model general contacts within an explicitly defined scene. The node stores a binary value that denotes whether it contacts other nodes. Each edge stores a binary value indicating whether the two connected nodes are in contact. The node definition is unified for a certain scene and shared between diverse interactive skills. For example, we define three nodes: hands, hands-exclusive body, and ball, to form a simple CG to model contacts for diverse basketball skills.
  • Figure 4: The HOI imitation falls into kinematic local-optimal solutions without Contact Graph Reward (CGR): (b) use the head to help control the ball; (e) use the wrist to contact the ball; (h) fail to catch the object; (k) support the table to keep balance. In comparison, the guidance of CGR effectively yields precise interactions, as shown in (c, f, i, l).
  • Figure 5: Simulated humanoids exhibit comprehensive basketball skills. SkillMimic can teach humanoids a wide range of basketball skills using the same configuration in a purely data-driven manner, covering almost all fundamental basketball skills. Keyframes are placed in chronological order from left to right.
  • ...and 8 more figures