Table of Contents
Fetching ...

Task-Aware Morphology Optimization of Planar Manipulators via Reinforcement Learning

Arvind Kumar Mishra, Sohom Chakrabarty

TL;DR

This study uses Yoshikawa's manipulability to tackle morphology optimization for planar 2R manipulators with a reinforcement-learning (RL) framework. It first validates that RL (SAC, DDPG, PPO) can recover the analytic optimum for a centered circular task without access to the Jacobian, where the optimum is $L_1=L_2=\tfrac{R}{\sqrt{2}}$ and $\theta_2=90^\circ$, achieving $w_{\max}=\tfrac{R^2}{2}$. The framework is then extended to non-analytic tasks (ellipse and rectangle) by using the full morphology vector $(L_1,L_2,\theta_2)$, and a hybrid reward balances dexterity with geometric feasibility. Across all tasks, RL matches or surpasses grid-search and black-box baselines under the same evaluation budgets, with SAC offering stable convergence, DDPG yielding fast but more variable gains, and PPO being less sample-efficient. The results illustrate a scalable morphology-optimization approach that can adapt to changing workspaces and constraints, and point toward extensions to higher-DOF, dynamics, and multi-objective design in real hardware.

Abstract

In this work, Yoshikawa's manipulability index is used to investigate reinforcement learning (RL) as a framework for morphology optimization in planar robotic manipulators. A 2R manipulator tracking a circular end-effector path is first examined because this case has a known analytical optimum: equal link lengths and the second joint orthogonal to the first. This serves as a validation step to test whether RL can rediscover the optimum using reward feedback alone, without access to the manipulability expression or the Jacobian. Three RL algorithms (SAC, DDPG, and PPO) are compared with grid search and black-box optimizers, with morphology represented by a single action parameter phi that maps to the link lengths. All methods converge to the analytical solution, showing that numerical recovery of the optimum is possible without supplying analytical structure. Most morphology design tasks have no closed-form solutions, and grid or heuristic search becomes expensive as dimensionality increases. RL is therefore explored as a scalable alternative. The formulation used for the circular path is extended to elliptical and rectangular paths by expanding the action space to the full morphology vector (L1, L2, theta2). In these non-analytical settings, RL continues to converge reliably, whereas grid and black-box methods require far larger evaluation budgets. These results indicate that RL is effective for both recovering known optima and solving morphology optimization problems without analytical solutions.

Task-Aware Morphology Optimization of Planar Manipulators via Reinforcement Learning

TL;DR

This study uses Yoshikawa's manipulability to tackle morphology optimization for planar 2R manipulators with a reinforcement-learning (RL) framework. It first validates that RL (SAC, DDPG, PPO) can recover the analytic optimum for a centered circular task without access to the Jacobian, where the optimum is and , achieving . The framework is then extended to non-analytic tasks (ellipse and rectangle) by using the full morphology vector , and a hybrid reward balances dexterity with geometric feasibility. Across all tasks, RL matches or surpasses grid-search and black-box baselines under the same evaluation budgets, with SAC offering stable convergence, DDPG yielding fast but more variable gains, and PPO being less sample-efficient. The results illustrate a scalable morphology-optimization approach that can adapt to changing workspaces and constraints, and point toward extensions to higher-DOF, dynamics, and multi-objective design in real hardware.

Abstract

In this work, Yoshikawa's manipulability index is used to investigate reinforcement learning (RL) as a framework for morphology optimization in planar robotic manipulators. A 2R manipulator tracking a circular end-effector path is first examined because this case has a known analytical optimum: equal link lengths and the second joint orthogonal to the first. This serves as a validation step to test whether RL can rediscover the optimum using reward feedback alone, without access to the manipulability expression or the Jacobian. Three RL algorithms (SAC, DDPG, and PPO) are compared with grid search and black-box optimizers, with morphology represented by a single action parameter phi that maps to the link lengths. All methods converge to the analytical solution, showing that numerical recovery of the optimum is possible without supplying analytical structure. Most morphology design tasks have no closed-form solutions, and grid or heuristic search becomes expensive as dimensionality increases. RL is therefore explored as a scalable alternative. The formulation used for the circular path is extended to elliptical and rectangular paths by expanding the action space to the full morphology vector (L1, L2, theta2). In these non-analytical settings, RL continues to converge reliably, whereas grid and black-box methods require far larger evaluation budgets. These results indicate that RL is effective for both recovering known optima and solving morphology optimization problems without analytical solutions.

Paper Structure

This paper contains 25 sections, 26 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Planar 2R manipulator schematic.
  • Figure 2: Maximum-manipulability configuration of the planar 2R manipulator for a centered circular end-effector path ($L_1 = L_2 = R/\sqrt{2}$, $\theta_2 = 90^\circ$).
  • Figure 3: Reachable annulus of the planar 2R manipulator showing link lengths $L_1$, $L_2$, joint angles $\theta_1$, $\theta_2$ (elbow-up), and the inner/outer radii $[r_{\min}, r_{\max}]$. The shaded band corresponds to the geometric annulus that is reachable in principle for the chosen $(L_1,L_2)$.
  • Figure 4: Circle (analytical reward): method endpoints vs. analytical curve $w_{\text{norm}}=\sin(2\phi)$. Dashed: $\phi=45^\circ$.
  • Figure 5: Circle (analytical reward): absolute deviation $|\phi^\star-45^\circ|$ (deg). Top: BEST; Bottom: GREEDY.
  • ...and 6 more figures