RoomEditor++: A Parameter-Sharing Diffusion Architecture for High-Fidelity Furniture Synthesis
Qilong Wang, Xiaofan Ming, Zhenyi Lin, Jinwen Li, Dongwei Ren, Wangmeng Zuo, Qinghua Hu
TL;DR
This work addresses the lack of public benchmarks and feature misalignment in furniture synthesis by releasing RoomBench++, a large open dataset with realistic-scene and real-scene subsets, and introducing RoomEditor++, a parameter-sharing diffusion architecture that unifies reference and background processing. The shared-backbone design improves feature consistency and enables precise geometric and textural integration across U-Net and DiT backbones, achieving state-of-the-art results on RoomBench++ and strong generalization to unseen indoor scenes and related domains. Comprehensive experiments, including quantitative metrics, human studies, and cross-dataset evaluations (3D-FUTURE and DreamBooth), validate the method’s superiority and robustness, with ablations underscoring the value of dataset scale and architectural sharing. The work advances practical furniture synthesis for home design and e-commerce by providing an open benchmark and a scalable, generalizable diffusion-based solution.
Abstract
Virtual furniture synthesis, which seamlessly integrates reference objects into indoor scenes while maintaining geometric coherence and visual realism, holds substantial promise for home design and e-commerce applications. However, this field remains underexplored due to the scarcity of reproducible benchmarks and the limitations of existing image composition methods in achieving high-fidelity furniture synthesis while preserving background integrity. To overcome these challenges, we first present RoomBench++, a comprehensive and publicly available benchmark dataset tailored for this task. It consists of 112,851 training pairs and 1,832 testing pairs drawn from both real-world indoor videos and realistic home design renderings, thereby supporting robust training and evaluation under practical conditions. Then, we propose RoomEditor++, a versatile diffusion-based architecture featuring a parameter-sharing dual diffusion backbone, which is compatible with both U-Net and DiT architectures. This design unifies the feature extraction and inpainting processes for reference and background images. Our in-depth analysis reveals that the parameter-sharing mechanism enforces aligned feature representations, facilitating precise geometric transformations, texture preservation, and seamless integration. Extensive experiments validate that RoomEditor++ is superior over state-of-the-art approaches in terms of quantitative metrics, qualitative assessments, and human preference studies, while highlighting its strong generalization to unseen indoor scenes and general scenes without task-specific fine-tuning. The dataset and source code are available at \url{https://github.com/stonecutter-21/roomeditor}.
