PICO: Reconstructing 3D People In Contact with Objects
Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi, Arjun Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas
TL;DR
PICO tackles 3D human–object interaction from a single image by introducing PICO-db, a dataset that provides dense, bijective 3D contact annotations on both humans and objects, and PICO-fit, an optimization-based fitting pipeline that leverages these contacts to recover coherent 3D body and object meshes. The framework retrieves likely object shapes via OpenShape, transfers body contact patches to objects with an axis-based two-click method, and uses render-and-compare optimization to align and refine both meshes while enforcing contact and penetration constraints. Evaluations on out-of-domain in-lab datasets and in-the-wild imagery show that PICO-fit achieves state-of-the-art–like performance, with perceptual studies indicating higher realism and generalization to previously untackled object classes. The work demonstrates that dense, cross-domain contacts can serve as a scalable foundation for HOI understanding in the wild and points to future directions in direct contact regression and vision–language model integration.
Abstract
Recovering 3D Human-Object Interaction (HOI) from single color images is challenging due to depth ambiguities, occlusions, and the huge variation in object shape and appearance. Thus, past work requires controlled settings such as known object shapes and contacts, and tackles only limited object classes. Instead, we need methods that generalize to natural images and novel object classes. We tackle this in two main ways: (1) We collect PICO-db, a new dataset of natural images uniquely paired with dense 3D contact on both body and object meshes. To this end, we use images from the recent DAMON dataset that are paired with contacts, but these contacts are only annotated on a canonical 3D body. In contrast, we seek contact labels on both the body and the object. To infer these given an image, we retrieve an appropriate 3D object mesh from a database by leveraging vision foundation models. Then, we project DAMON's body contact patches onto the object via a novel method needing only 2 clicks per patch. This minimal human input establishes rich contact correspondences between bodies and objects. (2) We exploit our new dataset of contact correspondences in a novel render-and-compare fitting method, called PICO-fit, to recover 3D body and object meshes in interaction. PICO-fit infers contact for the SMPL-X body, retrieves a likely 3D object mesh and contact from PICO-db for that object, and uses the contact to iteratively fit the 3D body and object meshes to image evidence via optimization. Uniquely, PICO-fit works well for many object categories that no existing method can tackle. This is crucial to enable HOI understanding to scale in the wild. Our data and code are available at https://pico.is.tue.mpg.de.
