Phase-Aware Policy Learning for Skateboard Riding of Quadruped Robots via Feature-wise Linear Modulation

Minsung Yoon; Jeil Jeong; Sung-Eui Yoon

Phase-Aware Policy Learning for Skateboard Riding of Quadruped Robots via Feature-wise Linear Modulation

Minsung Yoon, Jeil Jeong, Sung-Eui Yoon

TL;DR

The paper tackles efficient, phase-aware skateboarding with quadruped robots by addressing the multi-modal, cyclic nature of riding and perception-driven control. It introduces Phase-Aware Policy Learning (PAPL), a phase-conditioned reinforcement learning framework that uses Feature-wise Linear Modulation (FiLM) to encode phase-specific behaviors within a shared policy, along with asymmetric privileged learning and distillation to bridge sim-to-real gaps. The method combines a phase clock with a phase-conditioned reward design and exteroceptive sensing to achieve robust, steering-capable riding and energy-efficient locomotion, demonstrated in simulation and transferred to a real robot without fine-tuning. Together, these contributions yield a practical approach for autonomous skateboard-riding quadrupeds with improved robustness, efficiency, and real-world applicability.

Abstract

Skateboards offer a compact and efficient means of transportation as a type of personal mobility device. However, controlling them with legged robots poses several challenges for policy learning due to perception-driven interactions and multi-modal control objectives across distinct skateboarding phases. To address these challenges, we introduce Phase-Aware Policy Learning (PAPL), a reinforcement-learning framework tailored for skateboarding with quadruped robots. PAPL leverages the cyclic nature of skateboarding by integrating phase-conditioned Feature-wise Linear Modulation layers into actor and critic networks, enabling a unified policy that captures phase-dependent behaviors while sharing robot-specific knowledge across phases. Our evaluations in simulation validate command-tracking accuracy and conduct ablation studies quantifying each component's contribution. We also compare locomotion efficiency against leg and wheel-leg baselines and show real-world transferability.

Phase-Aware Policy Learning for Skateboard Riding of Quadruped Robots via Feature-wise Linear Modulation

TL;DR

Abstract

Paper Structure (15 sections, 11 equations, 10 figures, 3 tables)

This paper contains 15 sections, 11 equations, 10 figures, 3 tables.

Introduction
Variable Notation
Skateboard Dynamics Modeling
Steering Dynamics
Propulsion Dynamics
Skateboard-Riding Policy Learning
Formulation of Skateboarding Policy Learning
Phase-Aware Policy Composition
Implementation Details
Experimental Results
Multi-Modality of Skateboarding Motions
Evaluation of Skateboarding Performance
Power Consumption Analysis
Real-World Experiments
Conclusion

Figures (10)

Figure 1: Belly-mounted RGB camera setup on the Unitree Go1 robot Unitree_Go1. The camera observes the skateboard deck surface and supplies visual feedback for localization and control, enabling resilient skateboarding maneuvers.
Figure 2: The center shows the reference frames: robot body $\mathcal{B}$, skateboard $\mathcal{S}$, deck $\mathcal{D}$, and world $\mathcal{W}$, along with the physical robot body $B$ and deck $D$ objects. The skateboard frame $\mathcal{S}$ is fixed to the board, with the deck $\mathcal{D}$ rotating about its roll axis relative to $\mathcal{S}$; joint variables include deck roll ($\psi_D$) and the yaw angles of the front and rear wheel axles ($\delta_F, \delta_R$). The left and right show exteroceptive inputs for training and deployment stages.
Figure 3: Illustration of the phase clock concept that manages the cyclic nature of skateboarding, along with representative motion snapshots of each phase.
Figure 4: Phase-Aware Policy Learning (PAPL) Framework for Skateboard Riding. (1) Simulation environments modeling skateboard–robot interaction. (2) Command scheduling that procedurally increases riding difficulty for broad command-space coverage margolis2024rapid. (3) Phase-clock representation that alternates over time between pushing, transition, and carving modes. (4) An asymmetric actor–critic architecture: the critic leverages full privileged information for effective policy guidance with clear situational awareness, while the actor relies solely on features that can be inferred or directly observed. (5) Proximal Policy Optimization (PPO) schulman2017proximal trains policy networks parameterized by $\theta$ (red-color networks) to maximize Eq. (\ref{['eq:objective_function']}). The converged policy is then distilled via Dataset Aggregation (DAgger) ross2011reduction for the estimators parameterized by $\phi$ (mint) using Eq. (\ref{['eq:dagger']}), replacing inaccessible information during deployment.
Figure 5: Multilayer perceptron (MLP) network variants. (a) Standard MLP, (b) FiLM-modulated MLP with phase-conditioned feature-wise modulation, and (c) Mixture-of-Experts MLP with phase-based expert weight blending.
...and 5 more figures

Phase-Aware Policy Learning for Skateboard Riding of Quadruped Robots via Feature-wise Linear Modulation

TL;DR

Abstract

Phase-Aware Policy Learning for Skateboard Riding of Quadruped Robots via Feature-wise Linear Modulation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)