Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform
Bruno Henriques, Benjamin Allaert, Jean-Philippe Vandeborre
TL;DR
The paper tackles clutter-free room modeling from a single indoor panorama without relying on layout-estimation cues. It introduces a single-stage, U-former–like generator that integrates a novel Windowed-FourierMixer (W-FourierMixer) block, where Fourier Units operate separately across height and width with gating and a windowed approach to capture semi-local symmetry. The model is trained with a composite loss $L_{Final}= \lambda_{Rec} L_{Rec} + \lambda_{Perc} L_{Perc} + \lambda_{Adv} L_{Adv} + \lambda_{GP} L_{GP} + \lambda_{FM} L_{FM}$, plus a High-Receptive-Field perceptual loss, enabling faithful texture and structural preservation on Structured3D, outperforming PanoDR, LGPN, and LaMa across multiple mask types. This approach provides a strong preprocessing step for clutter-free room modeling and downstream 3D reconstruction pipelines, with potential extensions to depth and 3D-aware supervision.
Abstract
With the growing demand for immersive digital applications, the need to understand and reconstruct 3D scenes has significantly increased. In this context, inpainting indoor environments from a single image plays a crucial role in modeling the internal structure of interior spaces as it enables the creation of textured and clutter-free reconstructions. While recent methods have shown significant progress in room modeling, they rely on constraining layout estimators to guide the reconstruction process. These methods are highly dependent on the performance of the structure estimator and its generative ability in heavily occluded environments. In response to these issues, we propose an innovative approach based on a U-Former architecture and a new Windowed-FourierMixer block, resulting in a unified, single-phase network capable of effectively handle human-made periodic structures such as indoor spaces. This new architecture proves advantageous for tasks involving indoor scenes where symmetry is prevalent, allowing the model to effectively capture features such as horizon/ceiling height lines and cuboid-shaped rooms. Experiments show the proposed approach outperforms current state-of-the-art methods on the Structured3D dataset demonstrating superior performance in both quantitative metrics and qualitative results. Code and models will be made publicly available.
