Table of Contents
Fetching ...

Tune to Learn: How Controller Gains Shape Robot Policy Learning

Antonia Bronars, Younghyo Park, Pulkit Agrawal

Abstract

Position controllers have become the dominant interface for executing learned manipulation policies. Yet a critical design decision remains understudied: how should we choose controller gains for policy learning? The conventional wisdom is to select gains based on desired task compliance or stiffness. However, this logic breaks down when controllers are paired with state-conditioned policies: effective stiffness emerges from the interplay between learned reactions and control dynamics, not from gains alone. We argue that gain selection should instead be guided by learnability: how amenable different gain settings are to the learning algorithm in use. In this work, we systematically investigate how position controller gains affect three core components of modern robot learning pipelines: behavior cloning, reinforcement learning from scratch, and sim-to-real transfer. Through extensive experiments across multiple tasks and robot embodiments, we find that: (1) behavior cloning benefits from compliant and overdamped gain regimes, (2) reinforcement learning can succeed across all gain regimes given compatible hyperparameter tuning, and (3) sim-to-real transfer is harmed by stiff and overdamped gain regimes. These findings reveal that optimal gain selection depends not on the desired task behavior, but on the learning paradigm employed. Project website: https://younghyopark.me/tune-to-learn

Tune to Learn: How Controller Gains Shape Robot Policy Learning

Abstract

Position controllers have become the dominant interface for executing learned manipulation policies. Yet a critical design decision remains understudied: how should we choose controller gains for policy learning? The conventional wisdom is to select gains based on desired task compliance or stiffness. However, this logic breaks down when controllers are paired with state-conditioned policies: effective stiffness emerges from the interplay between learned reactions and control dynamics, not from gains alone. We argue that gain selection should instead be guided by learnability: how amenable different gain settings are to the learning algorithm in use. In this work, we systematically investigate how position controller gains affect three core components of modern robot learning pipelines: behavior cloning, reinforcement learning from scratch, and sim-to-real transfer. Through extensive experiments across multiple tasks and robot embodiments, we find that: (1) behavior cloning benefits from compliant and overdamped gain regimes, (2) reinforcement learning can succeed across all gain regimes given compatible hyperparameter tuning, and (3) sim-to-real transfer is harmed by stiff and overdamped gain regimes. These findings reveal that optimal gain selection depends not on the desired task behavior, but on the learning paradigm employed. Project website: https://younghyopark.me/tune-to-learn

Paper Structure

This paper contains 42 sections, 3 theorems, 32 equations, 30 figures, 8 tables.

Key Result

Theorem 1

Consider the system eq:dynamics with $\mathbf{K}_p, \mathbf{K}_d > 0$. Suppose the policy $\pi$ produces i.i.d. action errors $\delta a(t)$ with variance $\sigma^2$ around the expert action. Then the steady-state position error variance is: In particular, the variance is independent of the mass $m$, and is minimized when $\mathbf{K}_p / \mathbf{K}_d$ is small. $\blacktriangleleft$$\blacktrianglel

Figures (30)

  • Figure 1: Different robot learning paradigms prefer different controller gain interfaces. Colored regions indicate gain regimes where each paradigm succeeds. Contrary to conventional wisdom of tuning gains for desired task compliance, optimal gains depend on the learning paradigm. Based on our experimental findings, heatmaps illustrate representative gain preferences for (a) behavior cloning, which favors compliant, overdamped gains, (b) reinforcement learning, which adapts to nearly any setting, and (c) sim-to-real transfer, which is degraded by stiff and overdamped gains.
  • Figure 2: Controller gains induce diverse action–response dynamics. We evaluate a broad range of representative gain configurations and their resulting dynamic responses to assess their impact on learnability.
  • Figure 3: Tracking response curves from existing robot datasets reveal tight command-following behavior, suggesting stiff controller gains are prevalent in existing data collection pipelines.
  • Figure 4: Task-level impedance can be decoupled from low-level controller gains with learned policies. A learned policy can achieve (a) compliant behavior despite stiff low-level gains, and (b) stiff behavior despite compliant gains.
  • Figure 5: Behavior cloning prefers compliant and overdamped controller gains. Closed-loop rollout success rates across a grid of proportional ($\mathbf{K}_p$) and derivative ($\mathbf{K}_d$) gains for diverse manipulation tasks and robot embodiments. Each heatmap reports success averaged over evaluation rollouts. Across tasks, higher success rate (darker red) consistently concentrates in the compliant, overdamped regime (upper-left), while stiff or weakly damped controllers yield degraded performance.
  • ...and 25 more figures

Theorems & Definitions (5)

  • Theorem 1: Gain-Dependent Error Variance
  • proof
  • Theorem 2: Mean-Square Response of a Second-Order System crandall2014random
  • Corollary 1: State-Space Impact of Policy Errors
  • Remark 1: The Attenuation--Difficulty Tradeoff