Table of Contents
Fetching ...

Deep Reinforcement Learning for Robotic Manipulation under Distribution Shift with Bounded Extremum Seeking

Shaifalee Saxena, Rafael Fierro, Alexander Scheinker

Abstract

Reinforcement learning has shown strong performance in robotic manipulation, but learned policies often degrade in performance when test conditions differ from the training distribution. This limitation is especially important in contact-rich tasks such as pushing and pick-and-place, where changes in goals, contact conditions, or robot dynamics can drive the system out-of-distribution at inference time. In this paper, we investigate a hybrid controller that combines reinforcement learning with bounded extremum seeking to improve robustness under such conditions. In the proposed approach, deep deterministic policy gradient (DDPG) policies are trained under standard conditions on the robotic pushing and pick-and-place tasks, and are then combined with bounded ES during deployment. The RL policy provides fast manipulation behavior, while bounded ES ensures robustness of the overall controller to time variations when operating conditions depart from those seen during training. The resulting controller is evaluated under several out-of-distribution settings, including time-varying goals and spatially varying friction patches.

Deep Reinforcement Learning for Robotic Manipulation under Distribution Shift with Bounded Extremum Seeking

Abstract

Reinforcement learning has shown strong performance in robotic manipulation, but learned policies often degrade in performance when test conditions differ from the training distribution. This limitation is especially important in contact-rich tasks such as pushing and pick-and-place, where changes in goals, contact conditions, or robot dynamics can drive the system out-of-distribution at inference time. In this paper, we investigate a hybrid controller that combines reinforcement learning with bounded extremum seeking to improve robustness under such conditions. In the proposed approach, deep deterministic policy gradient (DDPG) policies are trained under standard conditions on the robotic pushing and pick-and-place tasks, and are then combined with bounded ES during deployment. The RL policy provides fast manipulation behavior, while bounded ES ensures robustness of the overall controller to time variations when operating conditions depart from those seen during training. The resulting controller is evaluated under several out-of-distribution settings, including time-varying goals and spatially varying friction patches.

Paper Structure

This paper contains 19 sections, 1 theorem, 31 equations, 4 figures, 1 table.

Key Result

Proposition 1

Consider the fixed-goal planar pushing phase after the gripper establishes contact with the block at time $t_c$. Let $p_t^{\mathrm{obj}} \in \mathbb{R}^2$ denote the object position, let $g \in \mathbb{R}^2$ be a fixed goal, and assume that contact is maintained for all $t>t_c$. Then, for any desire

Figures (4)

  • Figure 3: Training success rates for the Fetch push and pick-and-place tasks under DDPG. The dotted gray curves denote the raw success rates, while the solid black curves denote the smoothed training trends. Each training epoch consists of 100 episodes.
  • Figure 4: Architecture of the ES-DRL controller. A supervisor selects binary $\beta \in \{0,1\}$ based on the contact flag and combines $a = \beta_t a^{\mathrm{RL}}_t + (1-\beta_t) a^{\mathrm{ES}}_t$. ES is initialized from DRL (dotted).
  • Figure 5: Robotic manipulation over spatially varying frictional surfaces. Left: Environment with three friction patches, $\mu = 0.8, 1.2, 1.5$. Middle: For a fixed goal, ES lacks a good initial pushing direction, RL fails in the high-friction region, and ES-DRL drives the block toward the goal. Right: For a time-varying goal, RL fails early, whereas ES-DRL tracks the goal.
  • Figure 6: 3D tracking of a time-varying goal. The RL-only controller (top) fails to track the goal, whereas ES-DRL (bottom) maintains substantially closer tracking.

Theorems & Definitions (1)

  • Proposition 1