Enhancing Floor Plan Recognition: A Hybrid Mix-Transformer and U-Net Approach for Precise Wall Segmentation
Dmitriy Parashchuk, Alexey Kapshitskiy, Yuriy Karyakin
TL;DR
This work addresses the challenge of precise wall segmentation in floor-plan images to enable reliable 3D reconstruction. It introduces MitUNet, a hybrid architecture that fuses a Mix-Transformer encoder with a U-Net decoder and scSE attention, optimized using asymmetric Tversky loss to balance boundary precision and recall. Across CubiCasa5k and a regional dataset, MitUNet achieves state-of-the-art boundary accuracy and efficient memory usage, with a two-stage transfer learning strategy enabling domain adaptation to complex regional hatchings. The authors provide public code and a regional dataset to promote reproducibility and further development in Scan-to-BIM pipelines.
Abstract
Automatic 3D reconstruction of indoor spaces from 2D floor plans necessitates high-precision semantic segmentation of structural elements, particularly walls. However, existing methods often struggle with detecting thin structures and maintaining geometric precision. This study introduces MitUNet, a hybrid neural network combining a Mix-Transformer encoder and a U-Net decoder enhanced with spatial and channel attention blocks. Our approach, optimized with the Tversky loss function, achieves a balance between precision and recall, ensuring accurate boundary recovery. Experiments on the CubiCasa5k dataset and a proprietary regional dataset demonstrate MitUNet's superiority in generating structurally correct masks with high boundary accuracy, outperforming standard models. This tool provides a robust foundation for automated 3D reconstruction pipelines. To ensure reproducibility and facilitate future research, the source code and the proprietary regional dataset are publicly available at https://github.com/aliasstudio/mitunet and https://doi.org/10.5281/zenodo.17871079 respectively.
