Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects

Justin Yu; Kush Hari; Karim El-Refai; Arnav Dalal; Justin Kerr; Chung Min Kim; Richard Cheng; Muhammad Zubair Irshad; Ken Goldberg

Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects

Justin Yu, Kush Hari, Karim El-Refai, Arnav Dalal, Justin Kerr, Chung Min Kim, Richard Cheng, Muhammad Zubair Irshad, Ken Goldberg

TL;DR

POGS introduces a persistent object representation that updates online for unseen irregular objects using a combination of language-grounded, grouping, and self-supervised features embedded in a 3D Gaussian field. It can be trained from a multi-view scene and tracked with a single stereo camera, eliminating the need for CAD models or full re-scans. The approach supports open-vocabulary queries for grasping and manipulation and updates pose estimates as objects move, including human perturbations and tool servoing. Experiments show average pose error of $2.92$ cm, up to $12$ consecutive resets, and tool perturbation recovery rates up to $80\%$ for perturbations up to $30^{\circ}$.

Abstract

Tracking and manipulating irregularly-shaped, previously unseen objects in dynamic environments is important for robotic applications in manufacturing, assembly, and logistics. Recently introduced Gaussian Splats efficiently model object geometry, but lack persistent state estimation for task-oriented manipulation. We present Persistent Object Gaussian Splat (POGS), a system that embeds semantics, self-supervised visual features, and object grouping features into a compact representation that can be continuously updated to estimate the pose of scanned objects. POGS updates object states without requiring expensive rescanning or prior CAD models of objects. After an initial multi-view scene capture and training phase, POGS uses a single stereo camera to integrate depth estimates along with self-supervised vision encoder features for object pose estimation. POGS supports grasping, reorientation, and natural language-driven manipulation by refining object pose estimates, facilitating sequential object reset operations with human-induced object perturbations and tool servoing, where robots recover tool pose despite tool perturbations of up to 30°. POGS achieves up to 12 consecutive successful object resets and recovers from 80% of in-grasp tool perturbations.

Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects

TL;DR

Abstract

Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)