Table of Contents
Fetching ...

REACT3D: Recovering Articulations for Interactive Physical 3D Scenes

Zhao Huang, Boyang Sun, Alexandros Delitzas, Jiaqi Chen, Marc Pollefeys

TL;DR

REACT3D addresses the lack of scalable, interactive 3D assets by converting static scenes into simulation-ready digital twins with articulated objects. It combines open-vocabulary openable-object detection, multi-view 3D segmentation, articulation estimation with refinement, and hidden-geometry completion, followed by clean scene integration and export to URDF/USD for diverse simulators. The approach achieves state-of-the-art performance on openable-object detection and articulation-estimation metrics across indoor scenes, while delivering high-fidelity interactive scenes with textures and consistent geometry. This framework enables large-scale, zero-shot generation of articulated environments, accelerating research in embodied AI, robotics perception, and interactive simulation.

Abstract

Interactive 3D scenes are increasingly vital for embodied intelligence, yet existing datasets remain limited due to the labor-intensive process of annotating part segmentation, kinematic types, and motion trajectories. We present REACT3D, a scalable zero-shot framework that converts static 3D scenes into simulation-ready interactive replicas with consistent geometry, enabling direct use in diverse downstream tasks. Our contributions include: (i) openable-object detection and segmentation to extract candidate movable parts from static scenes, (ii) articulation estimation that infers joint types and motion parameters, (iii) hidden-geometry completion followed by interactive object assembly, and (iv) interactive scene integration in widely supported formats to ensure compatibility with standard simulation platforms. We achieve state-of-the-art performance on detection/segmentation and articulation metrics across diverse indoor scenes, demonstrating the effectiveness of our framework and providing a practical foundation for scalable interactive scene generation, thereby lowering the barrier to large-scale research on articulated scene understanding. Our project page is https://react3d.github.io/

REACT3D: Recovering Articulations for Interactive Physical 3D Scenes

TL;DR

REACT3D addresses the lack of scalable, interactive 3D assets by converting static scenes into simulation-ready digital twins with articulated objects. It combines open-vocabulary openable-object detection, multi-view 3D segmentation, articulation estimation with refinement, and hidden-geometry completion, followed by clean scene integration and export to URDF/USD for diverse simulators. The approach achieves state-of-the-art performance on openable-object detection and articulation-estimation metrics across indoor scenes, while delivering high-fidelity interactive scenes with textures and consistent geometry. This framework enables large-scale, zero-shot generation of articulated environments, accelerating research in embodied AI, robotics perception, and interactive simulation.

Abstract

Interactive 3D scenes are increasingly vital for embodied intelligence, yet existing datasets remain limited due to the labor-intensive process of annotating part segmentation, kinematic types, and motion trajectories. We present REACT3D, a scalable zero-shot framework that converts static 3D scenes into simulation-ready interactive replicas with consistent geometry, enabling direct use in diverse downstream tasks. Our contributions include: (i) openable-object detection and segmentation to extract candidate movable parts from static scenes, (ii) articulation estimation that infers joint types and motion parameters, (iii) hidden-geometry completion followed by interactive object assembly, and (iv) interactive scene integration in widely supported formats to ensure compatibility with standard simulation platforms. We achieve state-of-the-art performance on detection/segmentation and articulation metrics across diverse indoor scenes, demonstrating the effectiveness of our framework and providing a practical foundation for scalable interactive scene generation, thereby lowering the barrier to large-scale research on articulated scene understanding. Our project page is https://react3d.github.io/

Paper Structure

This paper contains 32 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: REACT3D transforms static 3D scenes into interactive scenes in a zero-shot manner. The generated interactive scenes are spatially aligned with the static input and preserve the original geometry and appearance. Our results are readily compatible with multiple simulation platforms, supporting diverse downstream tasks such as robotic perception, interaction, and embodied intelligence.
  • Figure 2: Overview of REACT3D. Given a static 3D scene, our method first applies open-vocabulary detection to identify openable objects and segmentation to extract their movable parts. We then estimate articulations and generate hidden geometry to obtain interactive objects. Finally, they are integrated with the static background to produce a simulation-ready interactive scene.
  • Figure 3: Pipeline for interactive object generation. From left to right, the figure shows key intermediate results of interactive object generation. In the last column, the thin red line highlights the contour of the base part.
  • Figure 4: Qualitative results of REACT3D. Static input scenes from ScanNet++ and the interactive outputs generated by REACT3D, visualized in Isaac Sim.
  • Figure 5: Manipulation GUIs. Interfaces in ROS and Isaac Sim enabling per-object articulation control and benchmarking.
  • ...and 2 more figures