BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects
Wanyue Zhang, Rishabh Dabral, Vladislav Golyanik, Vasileios Choutas, Eduardo Alvarado, Thabo Beeler, Marc Habermann, Christian Theobalt
TL;DR
BimArt tackles the challenge of generating realistic 3D bimanual hand interactions with articulated objects given object trajectories. It introduces a three-stage pipeline: (i) an articulation-aware canonical object representation based on part-based Basis Point Sets, (ii) a diffusion-based Bimanual Contact Generation Model that produces left and right contact maps, and (iii) a diffusion-based Bimanual Hand Motion Model guided by these contacts, followed by MANO-based optimization to ensure physical plausibility. The key contributions include a unified, category-agnostic object representation, a generative contact prior for articulated objects, and a contact-guided motion synthesis framework that yields high diversity and plausibility, validated on ARCTIC and HOI4D datasets. The approach achieves state-of-the-art performance in interaction plausibility and diversity, enabling artists and researchers to synthesize controllable, realistic hand-object animations for articulated objects. While demonstrated on a fixed set of object categories, the framework points toward zero-shot generalization and faster sampling as promising future directions.
Abstract
We present BimArt, a novel generative approach for synthesizing 3D bimanual hand interactions with articulated objects. Unlike prior works, we do not rely on a reference grasp, a coarse hand trajectory, or separate modes for grasping and articulating. To achieve this, we first generate distance-based contact maps conditioned on the object trajectory with an articulation-aware feature representation, revealing rich bimanual patterns for manipulation. The learned contact prior is then used to guide our hand motion generator, producing diverse and realistic bimanual motions for object movement and articulation. Our work offers key insights into feature representation and contact prior for articulated objects, demonstrating their effectiveness in taming the complex, high-dimensional space of bimanual hand-object interactions. Through comprehensive quantitative experiments, we demonstrate a clear step towards simplified and high-quality hand-object animations that surpass the state of the art in motion quality and diversity. Project page: https://vcai.mpi-inf.mpg.de/projects/bimart/.
