Table of Contents
Fetching ...

Skeletal Data Matching and Merging from Multiple RGB-D Sensors for Room-Scale Human Behaviour Tracking

Adrien Coppens, Valérie Maquil

TL;DR

The approach to tackle challenges related to the calibration of the sensors relative to each other to provide a common frame of reference and the results successfully enable unobtrusive and occlusion-resilient human behaviour tracking at room scale are discussed.

Abstract

A popular and affordable option to provide room-scale human behaviour tracking is to rely on commodity RGB-D sensors %todo: such as the Kinect family of devices? as such devices offer body tracking capabilities at a reasonable price point. While their capabilities may be sufficient for applications such as entertainment systems where a person plays in front of a television, RGB-D sensors are sensitive to occlusions from objects or other persons that might be in the way in more complex room-scale setups. To alleviate the occlusion issue but also in order to extend the tracking range and strengthen its accuracy, it is possible to rely on multiple RGB-D sensors and perform data fusion. Unfortunately, fusing the data in a meaningful manner raises additional challenges related to the calibration of the sensors relative to each other to provide a common frame of reference, but also regarding skeleton matching and merging when actually combining the data. In this paper, we discuss our approach to tackle these challenges and present the results we achieved, through aligned point clouds and combined skeleton lists. These results successfully enable unobtrusive and occlusion-resilient human behaviour tracking at room scale, that may be used as input for interactive applications as well as (possibly remote) collaborative systems.

Skeletal Data Matching and Merging from Multiple RGB-D Sensors for Room-Scale Human Behaviour Tracking

TL;DR

The approach to tackle challenges related to the calibration of the sensors relative to each other to provide a common frame of reference and the results successfully enable unobtrusive and occlusion-resilient human behaviour tracking at room scale are discussed.

Abstract

A popular and affordable option to provide room-scale human behaviour tracking is to rely on commodity RGB-D sensors %todo: such as the Kinect family of devices? as such devices offer body tracking capabilities at a reasonable price point. While their capabilities may be sufficient for applications such as entertainment systems where a person plays in front of a television, RGB-D sensors are sensitive to occlusions from objects or other persons that might be in the way in more complex room-scale setups. To alleviate the occlusion issue but also in order to extend the tracking range and strengthen its accuracy, it is possible to rely on multiple RGB-D sensors and perform data fusion. Unfortunately, fusing the data in a meaningful manner raises additional challenges related to the calibration of the sensors relative to each other to provide a common frame of reference, but also regarding skeleton matching and merging when actually combining the data. In this paper, we discuss our approach to tackle these challenges and present the results we achieved, through aligned point clouds and combined skeleton lists. These results successfully enable unobtrusive and occlusion-resilient human behaviour tracking at room scale, that may be used as input for interactive applications as well as (possibly remote) collaborative systems.
Paper Structure (9 sections, 1 equation, 3 figures)

This paper contains 9 sections, 1 equation, 3 figures.

Figures (3)

  • Figure 1: Generation of a partial point cloud (right) from a depth image (left). The blue circles show a selection of three points on the depth image, replicated on the image plane seen on the right picture. As these points have a similar colour on the depth image, they are at similar distance from the sensor (visualised as blue lines of similar length on the right picture) and the combination of that distance information with a projection of the image points using the sensor's intrinsic parameters generates resulting points to be included in the point cloud.
  • Figure 2: Calibration results using the ICP approach with filtered point clouds, with green and red (full) point clouds corresponding to two separate sensors. In the first two subfigures, the shapes of calibration objects (a standing desk and a cardboard box) may be perceived, with the single user standing near them.
  • Figure 3: The skeleton matching problem, with skeletons from two different sensors that are matched to form a merged set of skeletons. While most persons in the room are tracked by both sensors and therefore result in two overlapping skeletons, some are isolated as only one of the sensors currently sees them.