Table of Contents
Fetching ...

X-maps: Direct Depth Lookup for Event-based Structured Light Systems

Wieland Morgenstern, Niklas Gard, Simon Baumann, Anna Hilsmann, Peter Eisert

TL;DR

The paper addresses the challenge of real-time depth estimation for Spatial Augmented Reality using event cameras and cheap laser projectors. It introduces X-maps, a rectified, per-row projection map derived from the projector time map, enabling direct disparity lookups with per-event minimal computation, along with time map calibration to correct non-linear scanner timing. The key contributions include the construction of a reference X-map, a direct per-event disparity lookup, and a robust time map calibration that yields depth accuracy comparable to time-map-based ESL methods while achieving orders-of-magnitude faster runtimes on CPU ($<3$ ms per frame) at $60$ Hz. This approach enables high-frame-rate, low-latency SAR experiences with low-cost hardware and straightforward calibration, significantly advancing real-time, projection-based AR with event-based sensing.

Abstract

We present a new approach to direct depth estimation for Spatial Augmented Reality (SAR) applications using event cameras. These dynamic vision sensors are a great fit to be paired with laser projectors for depth estimation in a structured light approach. Our key contributions involve a conversion of the projector time map into a rectified X-map, capturing x-axis correspondences for incoming events and enabling direct disparity lookup without any additional search. Compared to previous implementations, this significantly simplifies depth estimation, making it more efficient, while the accuracy is similar to the time map-based process. Moreover, we compensate non-linear temporal behavior of cheap laser projectors by a simple time map calibration, resulting in improved performance and increased depth estimation accuracy. Since depth estimation is executed by two lookups only, it can be executed almost instantly (less than 3 ms per frame with a Python implementation) for incoming events. This allows for real-time interactivity and responsiveness, which makes our approach especially suitable for SAR experiences where low latency, high frame rates and direct feedback are crucial. We present valuable insights gained into data transformed into X-maps and evaluate our depth from disparity estimation against the state of the art time map-based results. Additional results and code are available on our project page: https://fraunhoferhhi.github.io/X-maps/

X-maps: Direct Depth Lookup for Event-based Structured Light Systems

TL;DR

The paper addresses the challenge of real-time depth estimation for Spatial Augmented Reality using event cameras and cheap laser projectors. It introduces X-maps, a rectified, per-row projection map derived from the projector time map, enabling direct disparity lookups with per-event minimal computation, along with time map calibration to correct non-linear scanner timing. The key contributions include the construction of a reference X-map, a direct per-event disparity lookup, and a robust time map calibration that yields depth accuracy comparable to time-map-based ESL methods while achieving orders-of-magnitude faster runtimes on CPU ( ms per frame) at Hz. This approach enables high-frame-rate, low-latency SAR experiences with low-cost hardware and straightforward calibration, significantly advancing real-time, projection-based AR with event-based sensing.

Abstract

We present a new approach to direct depth estimation for Spatial Augmented Reality (SAR) applications using event cameras. These dynamic vision sensors are a great fit to be paired with laser projectors for depth estimation in a structured light approach. Our key contributions involve a conversion of the projector time map into a rectified X-map, capturing x-axis correspondences for incoming events and enabling direct disparity lookup without any additional search. Compared to previous implementations, this significantly simplifies depth estimation, making it more efficient, while the accuracy is similar to the time map-based process. Moreover, we compensate non-linear temporal behavior of cheap laser projectors by a simple time map calibration, resulting in improved performance and increased depth estimation accuracy. Since depth estimation is executed by two lookups only, it can be executed almost instantly (less than 3 ms per frame with a Python implementation) for incoming events. This allows for real-time interactivity and responsiveness, which makes our approach especially suitable for SAR experiences where low latency, high frame rates and direct feedback are crucial. We present valuable insights gained into data transformed into X-maps and evaluate our depth from disparity estimation against the state of the art time map-based results. Additional results and code are available on our project page: https://fraunhoferhhi.github.io/X-maps/
Paper Structure (19 sections, 4 equations, 8 figures, 2 tables)

This paper contains 19 sections, 4 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Through the use of X-maps, we can establish real-time Spatial Augmented Reality (SAR) applications, using an event camera and laser projector system. We calculate depth from the projection with minimal computational effort at high frame rates. Our demonstrator projects the color coded depth back into the scene. Here, (a) is a static scene, (b) shows the projector and camera looking onto the scene with the projection in the background, and (c) is demonstrating depth estimation on moving objects.
  • Figure 2: A time map of recorded events for a single frame projected by a laser projector onto a plane. Matching time entries of the map along epipolar lines with an idealized projector time map to compute scene disparity is computationally expensive Muglikar2021.
  • Figure 3: When we plot incoming events of a projected frame with their y coordinate over time, we can clearly separate them into columns, grouped by their time stamps.
  • Figure 4: This figure shows all events of a single column from Figure \ref{['fig:x-map-zoom-x']}. Even though we are projecting onto a plane, we can see that events of the same temporal local group have jitter with 2-3 pixels on the x-axis. The jitter will lead to events overlaying with those of the neighboring columns. Thus it is not possible to clearly distinguish projected scan lines in the y/x view, while it is possible in the y/t view in Figure \ref{['fig:x-map-zoom-x']}.
  • Figure 5: A rectified camera X-map for a single projected frame. It shows the scene of Figure \ref{['fig:comb_scene_shots']} (b). The X-map is the product of flattening the spatio-temporal event cuboid of dimensions $(x,y,t)$ into a 2D image of $(y,t) \longmapsto x$. The idea of the time map is similar, but applies the concept to a different face of the cuboid (mapping $(x,y) \longmapsto t$). The X-map forms the basis for our method. As x values are the values encoded in the map, subtracting a rectified camera X-map from an idealized projector X-map yields the disparity value.
  • ...and 3 more figures