Task-Aware Morphology Optimization of Planar Manipulators via Reinforcement Learning
Arvind Kumar Mishra, Sohom Chakrabarty
TL;DR
This study uses Yoshikawa's manipulability to tackle morphology optimization for planar 2R manipulators with a reinforcement-learning (RL) framework. It first validates that RL (SAC, DDPG, PPO) can recover the analytic optimum for a centered circular task without access to the Jacobian, where the optimum is $L_1=L_2=\tfrac{R}{\sqrt{2}}$ and $\theta_2=90^\circ$, achieving $w_{\max}=\tfrac{R^2}{2}$. The framework is then extended to non-analytic tasks (ellipse and rectangle) by using the full morphology vector $(L_1,L_2,\theta_2)$, and a hybrid reward balances dexterity with geometric feasibility. Across all tasks, RL matches or surpasses grid-search and black-box baselines under the same evaluation budgets, with SAC offering stable convergence, DDPG yielding fast but more variable gains, and PPO being less sample-efficient. The results illustrate a scalable morphology-optimization approach that can adapt to changing workspaces and constraints, and point toward extensions to higher-DOF, dynamics, and multi-objective design in real hardware.
Abstract
In this work, Yoshikawa's manipulability index is used to investigate reinforcement learning (RL) as a framework for morphology optimization in planar robotic manipulators. A 2R manipulator tracking a circular end-effector path is first examined because this case has a known analytical optimum: equal link lengths and the second joint orthogonal to the first. This serves as a validation step to test whether RL can rediscover the optimum using reward feedback alone, without access to the manipulability expression or the Jacobian. Three RL algorithms (SAC, DDPG, and PPO) are compared with grid search and black-box optimizers, with morphology represented by a single action parameter phi that maps to the link lengths. All methods converge to the analytical solution, showing that numerical recovery of the optimum is possible without supplying analytical structure. Most morphology design tasks have no closed-form solutions, and grid or heuristic search becomes expensive as dimensionality increases. RL is therefore explored as a scalable alternative. The formulation used for the circular path is extended to elliptical and rectangular paths by expanding the action space to the full morphology vector (L1, L2, theta2). In these non-analytical settings, RL continues to converge reliably, whereas grid and black-box methods require far larger evaluation budgets. These results indicate that RL is effective for both recovering known optima and solving morphology optimization problems without analytical solutions.
