EchoScan: Scanning Complex Room Geometries via Acoustic Echoes

Inmo Yeon; Iljoo Jeong; Seungchul Lee; Jung-Woo Choi

EchoScan: Scanning Complex Room Geometries via Acoustic Echoes

Inmo Yeon, Iljoo Jeong, Seungchul Lee, Jung-Woo Choi

TL;DR

EchoScan is introduced, a deep neural network model that utilizes acoustic echoes to perform room geometry inference that overcomes conventional sound-based techniques by directly inferring room floorplan maps and height maps, thereby enabling it to handle rooms with complex shapes, including curved walls.

Abstract

Accurate estimation of indoor space geometries is vital for constructing precise digital twins, whose broad industrial applications include navigation in unfamiliar environments and efficient evacuation planning, particularly in low-light conditions. This study introduces EchoScan, a deep neural network model that utilizes acoustic echoes to perform room geometry inference. Conventional sound-based techniques rely on estimating geometry-related room parameters such as wall position and room size, thereby limiting the diversity of inferable room geometries. Contrarily, EchoScan overcomes this limitation by directly inferring room floorplan maps and height maps, thereby enabling it to handle rooms with complex shapes, including curved walls. The segmentation task for predicting floorplan and height maps enables the model to leverage both low- and high-order reflections. The use of high-order reflections further allows EchoScan to infer complex room shapes when some walls of the room are unobservable from the position of an audio device. Herein, EchoScan was trained and evaluated using RIRs synthesized from complex environments, including the Manhattan and Atlanta layouts, employing a practical audio device configuration compatible with commercial, off-the-shelf devices.

EchoScan: Scanning Complex Room Geometries via Acoustic Echoes

TL;DR

Abstract

Paper Structure (22 sections, 8 equations, 15 figures, 2 tables)

This paper contains 22 sections, 8 equations, 15 figures, 2 tables.

Introduction
Problem Statement
Methodology
Encoder--Decoder Architecture
Loss Function
Experiment Setup
Audio Device Configuration
Acoustic Simulation
Room Geometry Dataset
Basic Room Dataset
Manhattan-Atlanta Room Dataset
Training Configuration
Evaluation Metrics
Experimental Results
Ablation Studies on the MA Module
...and 7 more sections

Figures (15)

Figure 1: Conceptual illustration of the RGI task using an audio device positioned in the NLOS region.
Figure 2: Encoder--decoder architecture of the proposed EchoScan. The encoder extracts latent features and the MA module aggregates them in time. The decoder generates two segmented images for the floorplan and height maps using its dual-head structure. The encoder consists of convolution blocks (CB) and the decoder comprises upsampling and convolution blocks (UCBs). The dimensions indicated with each encoder block or layer indicate its output dimensions (channel, time), while those for the decoder represent the output dimensions (channel, width, height). The symbol $C$ denotes the channel dimension of the input, and $D$ is the time or space dimension for the 1D or 2D convolution block. For UCBs, the input is a 3D tensor with dimensions: (channels $C$, width $D$, height $D$), and the outputs are of size (channels $C_{out}$, width $2D$, height $2D$). Strides of convolution layers are 1 unless separately notified.
Figure 3: Examples of Manhattan and Atlanta layout rooms (left) and their floorplan maps (right). The red dots indicate the position of the audio device. Since EchoScan predicts the room geometry from the location of the audio device, the audio device is always at the center $(0,0)$ of the floorplan map. (a) Manhattan layout room containing only right-angled walls, and (b) Atlanta layout room including curved walls. Floorplan maps are magnified for better visibility.
Figure 4: Construction of the Manhattan-Atlanta room dataset. The room geometry dataset for both layouts follows the one used for the AtlantaNet pintore_atlantanet. For each Manhattan and Atlanta layout, we simulated 50,000 RIRs for training and 1,000 RIRs for testing.
Figure 5: Estimated floorplan and height maps for the basic room dataset containing five types of simple-shaped rooms: quadrilateral, pentagonal, hexagonal, L-type, and T-type. Two examples showing IOU performance close to the average IOU for their respective room types were selected and presented. The red dot indicates the position of the audio device, and the thick orange line displays the boundaries of the GT room. (a) Quadrilateral rooms, (b) Pentagonal rooms, (c) Hexagonal rooms, (d) L-LOS rooms, (e) L-NLOS rooms, (f) T-LOS rooms, and (g) T-NLOS rooms.
...and 10 more figures

EchoScan: Scanning Complex Room Geometries via Acoustic Echoes

TL;DR

Abstract

EchoScan: Scanning Complex Room Geometries via Acoustic Echoes

Authors

TL;DR

Abstract

Table of Contents

Figures (15)