GOMP: Grasped Object Manifold Projection for Multimodal Imitation Learning of Manipulation
William van den Bogert, Gregory Linkowski, Nima Fazeli
TL;DR
<3-5 sentence high-level summary> This paper tackles the problem of compounding errors in imitation learning for high-precision manipulation by introducing Grasped Object Manifold Projection (GOMP), which constrains a non-rigidly grasped object to a learned low-dimensional task manifold derived from expert demonstrations. GOMP couples diffusion-based IL with an interactive 7-arm bandit to select the optimal projection dimensionality onto the task manifold, thereby reducing error accumulation and improving robustness across four precise assembly tasks using tactile feedback. The approach relies on PCA-based task-space derivation via PGA, careful observation encoding from tactile and proprioceptive signals, and a strong demonstration-processing pipeline; results show consistent improvements over vanilla diffusion-based IL in nut threading, peg insertion, USB insertion, and battery cover placement. The method is modality-agnostic and aims to enable fixtureless, high-precision robotic assembly in practical settings by leveraging geometry-driven constraints on grasped objects.
Abstract
Imitation Learning (IL) holds great potential for learning repetitive manipulation tasks, such as those in industrial assembly. However, its effectiveness is often limited by insufficient trajectory precision due to compounding errors. In this paper, we introduce Grasped Object Manifold Projection (GOMP), an interactive method that mitigates these errors by constraining a non-rigidly grasped object to a lower-dimensional manifold. GOMP assumes a precise task in which a manipulator holds an object that may shift within the grasp in an observable manner and must be mated with a grounded part. Crucially, all GOMP enhancements are learned from the same expert dataset used to train the base IL policy, and are adjusted with an n-arm bandit-based interactive component. We propose a theoretical basis for GOMP's improvement upon the well-known compounding error bound in IL literature. We demonstrate the framework on four precise assembly tasks using tactile feedback, and note that the approach remains modality-agnostic. Data and videos are available at williamvdb.github.io/GOMPsite.
