Enforcing Task-Specified Compliance Bounds for Humanoids via Anisotropic Lipschitz-Constrained Policies

Zewen He; Yoshihiko Nakamura

Enforcing Task-Specified Compliance Bounds for Humanoids via Anisotropic Lipschitz-Constrained Policies

Zewen He, Yoshihiko Nakamura

Abstract

Reinforcement learning (RL) has demonstrated substantial potential for humanoid bipedal locomotion and the control of complex motions. To cope with oscillations and impacts induced by environmental interactions, compliant control is widely regarded as an effective remedy. However, the model-free nature of RL makes it difficult to impose task-specified and quantitatively verifiable compliance objectives, and classical model-based stiffness designs are not directly applicable. Lipschitz-Constrained Policies (LCP), which regularize the local sensitivity of a policy via gradient penalties, have recently been used to smooth humanoid motions. Nevertheless, existing LCP-based methods typically employ a single scalar Lipschitz budget and lack an explicit connection to physically meaningful compliance specifications in real-world systems. In this study, we propose an anisotropic Lipschitz-constrained policy (ALCP) that maps a task-space stiffness upper bound to a state-dependent Lipschitz-style constraint on the policy Jacobian. The resulting constraint is enforced during RL training via a hinge-squared spectral-norm penalty, preserving physical interpretability while enabling direction-dependent compliance. Experiments on humanoid robots show that ALCP improves locomotion stability and impact robustness, while reducing oscillations and energy usage.

Enforcing Task-Specified Compliance Bounds for Humanoids via Anisotropic Lipschitz-Constrained Policies

Abstract

Paper Structure (29 sections, 1 theorem, 29 equations, 6 figures, 4 tables)

This paper contains 29 sections, 1 theorem, 29 equations, 6 figures, 4 tables.

Introduction
Background
Learning-based Humanoid Control
Compliant Control
Lipschitz Contunuity and Lipschitz-constrained Policies
Lipschitz continuity
Lipschitz-Constrained policies (LCP)
Problem Statement
Methodology
Anisotropic Lipschitz-Constrained Policies
Anisotropic LCP upper-bound
Spectral (max-eigenvalue) violation
Joint-space Stiffness Upper-bound Ellipsoid
Anisotropic Lipschitz-Constrained Policies Based on Target Stiffness
Anisotropic upper-bound via an ellipsoidal metric induced by $\bm{K}_q^{\max}$
...and 14 more sections

Key Result

Proposition 1

Assume that the observation $\bm o$ contains the joint positions $\bm q$ and that $\pi_{\theta}$ is differentiable with respect to $\bm q$ in a neighborhood of interest. Then, for small perturbations around a nominal state, the incremental torque response to joint-position perturbations satisfies where the policy-induced equivalent joint stiffness is given by where $\alpha$ is the action scale o

Figures (6)

Figure 1: Experimental video screenshots. (a) Hand-pulling experiment under the SILC method. (b) Hand impact experiment. A 2.5 kg weight is attached to the robot's right hand via a rope and released from a certain height to generate an impact. (c) Stepping motion experiment. The robot's one foot intermittently steps onto a foam pad placed on the ground, thereby introducing disturbances from uneven terrain.
Figure 2: Control framework in reinforcement learning pipeline.
Figure 3: Stiffness upper-bound derivation diagram. $\bm K_x^{\max}$ denotes the task-space stiffness upper-bound ellipsoid, illustrated as the pink ellipsoid; $\bm K_q^{\max}$ denotes the joint-space stiffness upper-bound super-ellipsoid, where individual joint stiffnesses are depicted as torsional springs; $\bm K_{LCP}$ represents the Lipschitz upper-bound constraint.
Figure 4: The $x$-direction CoM trajectory receiving an $-x$-direction impact (150 N in 0.05 s) on pelvis link. The red triangle indicates the time instant at which the impact is applied. Compared with the high-stiff mode—in which the CoM returns to its original state via a damped oscillatory response—the compliance mode exhibits more pronounced compliance, as evidenced by larger oscillation amplitudes and the lack of convergence back to the original state. Meanwhile, scalar LCP also exhibits similarly high stiffness to high-stiff mode.
Figure 5: Hand displacement under different task-space compliance settings of SILC when a constant load (-40 N) is applied to the left hand. It can be observed that, in both cases, the hand displacement responds within a very short time ($<$ 0.1 s). After the response settles, we compare the steady-state position with the initial position and, together with the applied external force, approximately estimate the equivalent task-space stiffness to be 400 (soft) and 1000 N/m (hard), respectively.
...and 1 more figures

Theorems & Definitions (2)

Proposition 1: Policy-induced equivalent joint stiffness
Definition 1: Anisotropic Lipschitz constraint

Enforcing Task-Specified Compliance Bounds for Humanoids via Anisotropic Lipschitz-Constrained Policies

Abstract

Enforcing Task-Specified Compliance Bounds for Humanoids via Anisotropic Lipschitz-Constrained Policies

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)