Imitation Learning-based Direct Visual Servoing using the Large Projection Formulation
Sayantan Auddy, Antonio Paolillo, Justus Piater, Matteo Saveriano
TL;DR
This work tackles robust direct visual servoing in unstructured environments by marrying off-the-shelf DL perception with imitation-learning trajectories within a large projection control framework. The approach, called ildvs, uses a frozen DL detector (YOLO) to extract visual features and a Neural Ordinary Differential Equation (NODE) to generate corrective velocities learned from demonstrations, ensuring convergence to a target while enabling complex motions. Real-robot experiments on a Franka Panda with mouse and cup tasks show ildvs outperforms purely DL-based or purely imitation-based baselines, handling novel object positions and clutter and achieving high success in dropping objects into a cup. The method offers a modular perception-control integration with stability guarantees and is available as open-source.
Abstract
Today robots must be safe, versatile, and user-friendly to operate in unstructured and human-populated environments. Dynamical system-based imitation learning enables robots to perform complex tasks stably and without explicit programming, greatly simplifying their real-world deployment. To exploit the full potential of these systems it is crucial to implement closed loops that use visual feedback. Vision permits to cope with environmental changes, but is complex to handle due to the high dimension of the image space. This study introduces a dynamical system-based imitation learning for direct visual servoing. It leverages off-the-shelf deep learning-based perception modules to extract robust features from the raw input image, and an imitation learning strategy to execute sophisticated robot motions. The learning blocks are integrated using the large projection task priority formulation. As demonstrated through extensive experimental analysis, the proposed method realizes complex tasks with a robotic manipulator.
