Multi-Unit Floor Plan Recognition and Reconstruction Using Improved Semantic Segmentation of Raster-Wise Floor Plans

Lukas Kratochvila; Gijs de Jong; Monique Arkesteijn; Simon Bilik; Tomas Zemcik; Karel Horak; Jan S. Rellermeyer

Multi-Unit Floor Plan Recognition and Reconstruction Using Improved Semantic Segmentation of Raster-Wise Floor Plans

Lukas Kratochvila, Gijs de Jong, Monique Arkesteijn, Simon Bilik, Tomas Zemcik, Karel Horak, Jan S. Rellermeyer

TL;DR

This work tackles the challenge of generating 3D building representations from raster 2D floor plans to enable scalable digital twins for emergency planning. It introduces two end-to-end recognition-reconstruction pipelines, CAB1 and CAB2, built on MDA-Unet and MACU-Net with asymmetric convolution, a dual-channel/spatial attention mechanism, and multi-scale skip connections, coupled with a multi-task training objective and a heatmap-based opening regression. The reconstruction stage converts segmentation masks into vector polygons and refined 3D models, achieving a mean F1 score of $0.86$ and IoU of $0.76$ on CubiCasa, outperforming state-of-the-art baselines, while remaining applicable across several datasets (R3D, CVC-FP, MLSTRUCT-FP, MURF). The approach provides a practical, publicly available pipeline for generating 3D floor-plan representations from raster data, enabling safer and more efficient emergency planning and urban simulations.

Abstract

Digital twins have a major potential to form a significant part of urban management in emergency planning, as they allow more efficient designing of the escape routes, better orientation in exceptional situations, and faster rescue intervention. Nevertheless, creating the twins still remains a largely manual effort, due to a lack of 3D-representations, which are available only in limited amounts for some new buildings. Thus, in this paper we aim to synthesize 3D information from commonly available 2D architectural floor plans. We propose two novel pixel-wise segmentation methods based on the MDA-Unet and MACU-Net architectures with improved skip connections, an attention mechanism, and a training objective together with a reconstruction part of the pipeline, which vectorizes the segmented plans to create a 3D model. The proposed methods are compared with two other state-of-the-art techniques and several benchmark datasets. On the commonly used CubiCasa benchmark dataset, our methods have achieved the mean F1 score of 0.86 over five examined classes, outperforming the other pixel-wise approaches tested. We have also made our code publicly available to support research in the field.

Multi-Unit Floor Plan Recognition and Reconstruction Using Improved Semantic Segmentation of Raster-Wise Floor Plans

TL;DR

and IoU of

on CubiCasa, outperforming state-of-the-art baselines, while remaining applicable across several datasets (R3D, CVC-FP, MLSTRUCT-FP, MURF). The approach provides a practical, publicly available pipeline for generating 3D floor-plan representations from raster data, enabling safer and more efficient emergency planning and urban simulations.

Abstract

Paper Structure (38 sections, 7 equations, 27 figures, 12 tables, 2 algorithms)

This paper contains 38 sections, 7 equations, 27 figures, 12 tables, 2 algorithms.

Introduction
Related Work
Proposed Methods
Recognition
Asymmetric Convolution Block
Attention Mechanism
Channel Attention Module
Spatial Attention Module
Multi-Scale Feature Skip Connections
Training Objective
Reconstruction
Approximate Polygons
Refined Polygons
Experimental Description
Model Modifications
...and 23 more sections

Figures (27)

Figure 1: The recognition part, exposed on the left-hand side, produces a segmentation mask of the floor plan by using a custom convolutional neural network (CNN). The segmentation mask is subsequently refined through the reconstruction step, shown on the right-hand side, which applies post-processing, vectorization, and Blender-based visualization.
Figure 2: Two model architectures: (a) regular multi-scale feature skip connections, and (b) fully connected multi-scale feature skip connections. Here, $X^i_{En}$ means an encoder and $X^i_{De}$ a decoder block.
Figure 3: An overview of the attention module, which consists of a channel attention module (CAM) and a spatial attention module (SAM). The symbols $\bigoplus$, $\bigotimes$ and Ⓢ denote element-wise addition, multiplication, and the sigmoid operation.
Figure 4: An asymmetric convolution ($AC$) block using group normalization (GN) enhancing the skeleton features. The symbol $\oplus$ represents element-wise addition.
Figure 5: An example of an intermediate feature map $F^3$ of the third decoder layer $X^3_{De}$. The abbreviations AC and AM indicate an asymmetric convolution block and an attention mechanism; $X^i_{En}$ denotes blocks of convolutions.
...and 22 more figures

Multi-Unit Floor Plan Recognition and Reconstruction Using Improved Semantic Segmentation of Raster-Wise Floor Plans

TL;DR

Abstract

Multi-Unit Floor Plan Recognition and Reconstruction Using Improved Semantic Segmentation of Raster-Wise Floor Plans

Authors

TL;DR

Abstract

Table of Contents

Figures (27)