Table of Contents
Fetching ...

Ev-Layout: A Large-scale Event-based Multi-modal Dataset for Indoor Layout Estimation and Tracking

Xucheng Guo, Yiran Shen, Xiaofang Xiao, Yuanfeng Zhou, Lin Wang

TL;DR

This work tackles indoor layout estimation and tracking under dynamic motion and challenging lighting by introducing Ev-Layout, a large-scale, multi-modal dataset that combines RGB, event streams, IMU, and illumination data collected via a head-mounted VR platform. It also presents an event-based layout estimation pipeline with two key innovations: the Event Temporal Distribution Feature (ETDF), which models per-patch event distributions with an inhomogeneous Poisson process and uses $D_{KL}$ to measure patch similarity, and the Spatio-Temporal Feature Fusion Module (SFFM), which fuses ETDF with transformer-style features for robust edge and corner prediction. Experimental results show substantial improvements over prior event-based methods in dynamic indoor layout estimation, with strong robustness to high-speed motion and lighting variations. The dataset and methods collectively advance practical layout understanding for VR/MR, robotics, and spatial computing under challenging real-world conditions.

Abstract

This paper presents Ev-Layout, a novel large-scale event-based multi-modal dataset designed for indoor layout estimation and tracking. Ev-Layout makes key contributions to the community by: Utilizing a hybrid data collection platform (with a head-mounted display and VR interface) that integrates both RGB and bio-inspired event cameras to capture indoor layouts in motion. Incorporating time-series data from inertial measurement units (IMUs) and ambient lighting conditions recorded during data collection to highlight the potential impact of motion speed and lighting on layout estimation accuracy. The dataset consists of 2.5K sequences, including over 771.3K RGB images and 10 billion event data points. Of these, 39K images are annotated with indoor layouts, enabling research in both event-based and video-based indoor layout estimation. Based on the dataset, we propose an event-based layout estimation pipeline with a novel event-temporal distribution feature module to effectively aggregate the spatio-temporal information from events. Additionally, we introduce a spatio-temporal feature fusion module that can be easily integrated into a transformer module for fusion purposes. Finally, we conduct benchmarking and extensive experiments on the Ev-Layout dataset, demonstrating that our approach significantly improves the accuracy of dynamic indoor layout estimation compared to existing event-based methods.

Ev-Layout: A Large-scale Event-based Multi-modal Dataset for Indoor Layout Estimation and Tracking

TL;DR

This work tackles indoor layout estimation and tracking under dynamic motion and challenging lighting by introducing Ev-Layout, a large-scale, multi-modal dataset that combines RGB, event streams, IMU, and illumination data collected via a head-mounted VR platform. It also presents an event-based layout estimation pipeline with two key innovations: the Event Temporal Distribution Feature (ETDF), which models per-patch event distributions with an inhomogeneous Poisson process and uses to measure patch similarity, and the Spatio-Temporal Feature Fusion Module (SFFM), which fuses ETDF with transformer-style features for robust edge and corner prediction. Experimental results show substantial improvements over prior event-based methods in dynamic indoor layout estimation, with strong robustness to high-speed motion and lighting variations. The dataset and methods collectively advance practical layout understanding for VR/MR, robotics, and spatial computing under challenging real-world conditions.

Abstract

This paper presents Ev-Layout, a novel large-scale event-based multi-modal dataset designed for indoor layout estimation and tracking. Ev-Layout makes key contributions to the community by: Utilizing a hybrid data collection platform (with a head-mounted display and VR interface) that integrates both RGB and bio-inspired event cameras to capture indoor layouts in motion. Incorporating time-series data from inertial measurement units (IMUs) and ambient lighting conditions recorded during data collection to highlight the potential impact of motion speed and lighting on layout estimation accuracy. The dataset consists of 2.5K sequences, including over 771.3K RGB images and 10 billion event data points. Of these, 39K images are annotated with indoor layouts, enabling research in both event-based and video-based indoor layout estimation. Based on the dataset, we propose an event-based layout estimation pipeline with a novel event-temporal distribution feature module to effectively aggregate the spatio-temporal information from events. Additionally, we introduce a spatio-temporal feature fusion module that can be easily integrated into a transformer module for fusion purposes. Finally, we conduct benchmarking and extensive experiments on the Ev-Layout dataset, demonstrating that our approach significantly improves the accuracy of dynamic indoor layout estimation compared to existing event-based methods.

Paper Structure

This paper contains 16 sections, 5 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: (a) An overview of our data acquisition platform includes an event camera, an RGB camera, an IMU sensor, and a light sensor. When users play sports games with VR/MR devices, the head-mounted data acquisition platform simultaneously captures the indoor layout. (b) Ev-Layout is an event-based dataset for indoor layout estimation under fast motion and complex lighting conditions, potentially supporting spatial computing of event cameras for VR/MR. The dataset is collected via our multi-modal data acquisition platform and includes time-series data with varying lighting intensities, speeds, and structural configurations, accompanied by annotation of continuous layout labels. (c) Based on the dataset, we propose a novel layout estimation pipeline that integrates the temporal distribution feature of events to achieve precise layout estimation, leveraging events' high temporal resolution and high dynamic range. We expect our work could contribute to advancing the application of event cameras in layout estimation for VR/MR.
  • Figure 2: Data collection, processing, and annotation process.
  • Figure 3: Visualization of data annotation.
  • Figure 4: Statistical histogram depicting the ambient light intensity (left) and movement speed (right).
  • Figure 5: Statistics of eight types of indoor layout(left) and different types of indoor layout(right).
  • ...and 8 more figures