Table of Contents
Fetching ...

A Modular Pipeline for 3D Object Tracking Using RGB Cameras

Lars Bredereke, Yale Hartmann, Tanja Schultz

TL;DR

This work tackles 3D multi-object tracking with multiple time-synced RGB cameras by proposing a modular pipeline that estimates 3D trajectories even when camera poses vary between trials. It integrates a YOLO-based 2D detector, a gradient-based camera-parameter optimization to align six cameras in a common world frame, 3D object initialization via line intersections, and Extended Kalman Filter tracking to fuse detections over time, providing trajectory estimates with covariance as a confidence measure. The approach demonstrates robust performance on the Table Setting Dataset, achieving close agreement with OptiTrack for a cereal-box example and maintaining plausible trajectories across trials with missing cameras, while requiring minimal human annotation. The pipeline is scalable and adaptable to other scenes with stationary, time-synced cameras, and the accompanying data outputs and code enable reuse and broader evaluation.

Abstract

Object tracking is a key challenge of computer vision with various applications that all require different architectures. Most tracking systems have limitations such as constraining all movement to a 2D plane and they often track only one object. In this paper, we present a new modular pipeline that calculates 3D trajectories of multiple objects. It is adaptable to various settings where multiple time-synced and stationary cameras record moving objects, using off the shelf webcams. Our pipeline was tested on the Table Setting Dataset, where participants are recorded with various sensors as they set a table with tableware objects. We need to track these manipulated objects, using 6 rgb webcams. Challenges include: Detecting small objects in 9.874.699 camera frames, determining camera poses, discriminating between nearby and overlapping objects, temporary occlusions, and finally calculating a 3D trajectory using the right subset of an average of 11.12.456 pixel coordinates per 3-minute trial. We implement a robust pipeline that results in accurate trajectories with covariance of x,y,z-position as a confidence metric. It deals dynamically with appearing and disappearing objects, instantiating new Extended Kalman Filters. It scales to hundreds of table-setting trials with very little human annotation input, even with the camera poses of each trial unknown. The code is available at https://github.com/LarsBredereke/object_tracking

A Modular Pipeline for 3D Object Tracking Using RGB Cameras

TL;DR

This work tackles 3D multi-object tracking with multiple time-synced RGB cameras by proposing a modular pipeline that estimates 3D trajectories even when camera poses vary between trials. It integrates a YOLO-based 2D detector, a gradient-based camera-parameter optimization to align six cameras in a common world frame, 3D object initialization via line intersections, and Extended Kalman Filter tracking to fuse detections over time, providing trajectory estimates with covariance as a confidence measure. The approach demonstrates robust performance on the Table Setting Dataset, achieving close agreement with OptiTrack for a cereal-box example and maintaining plausible trajectories across trials with missing cameras, while requiring minimal human annotation. The pipeline is scalable and adaptable to other scenes with stationary, time-synced cameras, and the accompanying data outputs and code enable reuse and broader evaluation.

Abstract

Object tracking is a key challenge of computer vision with various applications that all require different architectures. Most tracking systems have limitations such as constraining all movement to a 2D plane and they often track only one object. In this paper, we present a new modular pipeline that calculates 3D trajectories of multiple objects. It is adaptable to various settings where multiple time-synced and stationary cameras record moving objects, using off the shelf webcams. Our pipeline was tested on the Table Setting Dataset, where participants are recorded with various sensors as they set a table with tableware objects. We need to track these manipulated objects, using 6 rgb webcams. Challenges include: Detecting small objects in 9.874.699 camera frames, determining camera poses, discriminating between nearby and overlapping objects, temporary occlusions, and finally calculating a 3D trajectory using the right subset of an average of 11.12.456 pixel coordinates per 3-minute trial. We implement a robust pipeline that results in accurate trajectories with covariance of x,y,z-position as a confidence metric. It deals dynamically with appearing and disappearing objects, instantiating new Extended Kalman Filters. It scales to hundreds of table-setting trials with very little human annotation input, even with the camera poses of each trial unknown. The code is available at https://github.com/LarsBredereke/object_tracking

Paper Structure

This paper contains 19 sections, 11 figures.

Figures (11)

  • Figure 1: The TSD cameras used for this paper are referred to as back, ceiling, counter-top, front, table-side and table-top.
  • Figure 2: Stages of the object tracking pipeline with references to their Sections
  • Figure 3: Screenshot of annotation tool used for labeling table(cyan), counter(yellow) and world origin(magenta) pixel positions
  • Figure 4: Example of camera and table parameters before and after optimization for a trial
  • Figure 5: Performance of YOLO network on test data
  • ...and 6 more figures