Table of Contents
Fetching ...

Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform

Bruno Henriques, Benjamin Allaert, Jean-Philippe Vandeborre

TL;DR

The paper tackles clutter-free room modeling from a single indoor panorama without relying on layout-estimation cues. It introduces a single-stage, U-former–like generator that integrates a novel Windowed-FourierMixer (W-FourierMixer) block, where Fourier Units operate separately across height and width with gating and a windowed approach to capture semi-local symmetry. The model is trained with a composite loss $L_{Final}= \lambda_{Rec} L_{Rec} + \lambda_{Perc} L_{Perc} + \lambda_{Adv} L_{Adv} + \lambda_{GP} L_{GP} + \lambda_{FM} L_{FM}$, plus a High-Receptive-Field perceptual loss, enabling faithful texture and structural preservation on Structured3D, outperforming PanoDR, LGPN, and LaMa across multiple mask types. This approach provides a strong preprocessing step for clutter-free room modeling and downstream 3D reconstruction pipelines, with potential extensions to depth and 3D-aware supervision.

Abstract

With the growing demand for immersive digital applications, the need to understand and reconstruct 3D scenes has significantly increased. In this context, inpainting indoor environments from a single image plays a crucial role in modeling the internal structure of interior spaces as it enables the creation of textured and clutter-free reconstructions. While recent methods have shown significant progress in room modeling, they rely on constraining layout estimators to guide the reconstruction process. These methods are highly dependent on the performance of the structure estimator and its generative ability in heavily occluded environments. In response to these issues, we propose an innovative approach based on a U-Former architecture and a new Windowed-FourierMixer block, resulting in a unified, single-phase network capable of effectively handle human-made periodic structures such as indoor spaces. This new architecture proves advantageous for tasks involving indoor scenes where symmetry is prevalent, allowing the model to effectively capture features such as horizon/ceiling height lines and cuboid-shaped rooms. Experiments show the proposed approach outperforms current state-of-the-art methods on the Structured3D dataset demonstrating superior performance in both quantitative metrics and qualitative results. Code and models will be made publicly available.

Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform

TL;DR

The paper tackles clutter-free room modeling from a single indoor panorama without relying on layout-estimation cues. It introduces a single-stage, U-former–like generator that integrates a novel Windowed-FourierMixer (W-FourierMixer) block, where Fourier Units operate separately across height and width with gating and a windowed approach to capture semi-local symmetry. The model is trained with a composite loss , plus a High-Receptive-Field perceptual loss, enabling faithful texture and structural preservation on Structured3D, outperforming PanoDR, LGPN, and LaMa across multiple mask types. This approach provides a strong preprocessing step for clutter-free room modeling and downstream 3D reconstruction pipelines, with potential extensions to depth and 3D-aware supervision.

Abstract

With the growing demand for immersive digital applications, the need to understand and reconstruct 3D scenes has significantly increased. In this context, inpainting indoor environments from a single image plays a crucial role in modeling the internal structure of interior spaces as it enables the creation of textured and clutter-free reconstructions. While recent methods have shown significant progress in room modeling, they rely on constraining layout estimators to guide the reconstruction process. These methods are highly dependent on the performance of the structure estimator and its generative ability in heavily occluded environments. In response to these issues, we propose an innovative approach based on a U-Former architecture and a new Windowed-FourierMixer block, resulting in a unified, single-phase network capable of effectively handle human-made periodic structures such as indoor spaces. This new architecture proves advantageous for tasks involving indoor scenes where symmetry is prevalent, allowing the model to effectively capture features such as horizon/ceiling height lines and cuboid-shaped rooms. Experiments show the proposed approach outperforms current state-of-the-art methods on the Structured3D dataset demonstrating superior performance in both quantitative metrics and qualitative results. Code and models will be made publicly available.
Paper Structure (19 sections, 10 equations, 9 figures, 4 tables)

This paper contains 19 sections, 10 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Clutter-Free Room Modeling in Indoor Spherical Panorama Images.A) Semantic mask representing clutter, B) Synthetic empty scene (ground truth), C) PanoDR gkitsas2021panodr and D) LGPN gao2022layout approaches, which rely on constraining layout estimators to guide the reconstruction process, E) LaMa suvorov2022resolution approach, a conditionless image inpainting method based on FFCschi2020fast, F) the proposed approach based on combining the innovative W-FourierMixer block with a Uformer-like architecture without any structure conditioning, demonstrating superior results in both texture generation and layout preservation.
  • Figure 2: Architecture overview. Left: Given a furnished panoramic image of the interior scene as input and the mask of the objects to be removed, our approach generates a plausible empty scene. The proposed network architecture is based on an Uformer-like adversarial framework supervised by low- and high-level loss functions and a discriminator. Right: The architecture of the proposed W-FourierMixer block consists of Fourier Units applied separatly across height and width dimension enabling a large receptive field, and gated convolutions that facilitate a learnable gating mechanism. Window operations split the image in half before feeding it to the corresponding Fourier Unit allowing the capture of semi-local information. Fourier Units accross the same dimension share weights.
  • Figure 3: Qualitative results with state-of-the-arts for transforming a cluttered indoor environment into a clutter-free room. The columns represent the different masks with more or less important ratios. The rows represent the approaches of the literature. The results show the effectiveness of our approach regardless of mask and room properties.
  • Figure 4: Fourier unit outputs across different dimensions. The top-left image represents the input, the top-right shows the output on the Fourier transform across height and width, the bottom-left on width alone, and the bottom-right on height alone. Only the first channel of the output is displayed for better visualization; no activation or normalization was applied after the convolution.
  • Figure 5: Features from Fourier units in the first block of the first stage of the encoder. The top-left image represents the network input, the top-right shows the output from the Fourier unit across width, the bottom-left across height, and the bottom-right the corresponding features across width with a window operation. Indeed, the symmetry effect is present in the feature maps, allowing the network to grasp a global structure understanding right from the early stages.
  • ...and 4 more figures