Ev-Layout: A Large-scale Event-based Multi-modal Dataset for Indoor Layout Estimation and Tracking
Xucheng Guo, Yiran Shen, Xiaofang Xiao, Yuanfeng Zhou, Lin Wang
TL;DR
This work tackles indoor layout estimation and tracking under dynamic motion and challenging lighting by introducing Ev-Layout, a large-scale, multi-modal dataset that combines RGB, event streams, IMU, and illumination data collected via a head-mounted VR platform. It also presents an event-based layout estimation pipeline with two key innovations: the Event Temporal Distribution Feature (ETDF), which models per-patch event distributions with an inhomogeneous Poisson process and uses $D_{KL}$ to measure patch similarity, and the Spatio-Temporal Feature Fusion Module (SFFM), which fuses ETDF with transformer-style features for robust edge and corner prediction. Experimental results show substantial improvements over prior event-based methods in dynamic indoor layout estimation, with strong robustness to high-speed motion and lighting variations. The dataset and methods collectively advance practical layout understanding for VR/MR, robotics, and spatial computing under challenging real-world conditions.
Abstract
This paper presents Ev-Layout, a novel large-scale event-based multi-modal dataset designed for indoor layout estimation and tracking. Ev-Layout makes key contributions to the community by: Utilizing a hybrid data collection platform (with a head-mounted display and VR interface) that integrates both RGB and bio-inspired event cameras to capture indoor layouts in motion. Incorporating time-series data from inertial measurement units (IMUs) and ambient lighting conditions recorded during data collection to highlight the potential impact of motion speed and lighting on layout estimation accuracy. The dataset consists of 2.5K sequences, including over 771.3K RGB images and 10 billion event data points. Of these, 39K images are annotated with indoor layouts, enabling research in both event-based and video-based indoor layout estimation. Based on the dataset, we propose an event-based layout estimation pipeline with a novel event-temporal distribution feature module to effectively aggregate the spatio-temporal information from events. Additionally, we introduce a spatio-temporal feature fusion module that can be easily integrated into a transformer module for fusion purposes. Finally, we conduct benchmarking and extensive experiments on the Ev-Layout dataset, demonstrating that our approach significantly improves the accuracy of dynamic indoor layout estimation compared to existing event-based methods.
