Table of Contents
Fetching ...

Addressing Data Annotation Challenges in Multiple Sensors: A Solution for Scania Collected Datasets

Ajinkya Khoche, Aron Asefaw, Alejandro Gonzalez, Bogdan Timus, Sina Sharif Mansouri, Patric Jensfelt

TL;DR

The paper tackles data annotation challenges in multi-sensor autonomous-vehicle datasets for heavy vehicles by modeling non-ego object motion with a planar 2D state $ \bm{x}= [d,s]^T $ along a trajectory of headings $\bm{\Theta}$ and using constant-acceleration dynamics. A Moving Horizon Estimation (MHE) framework processes a track of noisy human annotations to produce robust speed estimates $s^\ast$ over a horizon $N_e$, which are then used to speed-compensate sensor points and refine 3D bounding boxes. The refinement pipeline clusters views by heading, generates $G$ pseudo bounding boxes, and assigns them to speed-compensated clusters, enabling better coverage of dynamic objects across sensors. Validation on real Scania truck/bus data shows smoother speed trajectories and improved annotation completeness, with potential to enhance ground-truth quality for downstream perception and evaluation tasks, and the approach can be extended to additional object classes and longer sequences.

Abstract

Data annotation in autonomous vehicles is a critical step in the development of Deep Neural Network (DNN) based models or the performance evaluation of the perception system. This often takes the form of adding 3D bounding boxes on time-sequential and registered series of point-sets captured from active sensors like Light Detection and Ranging (LiDAR) and Radio Detection and Ranging (RADAR). When annotating multiple active sensors, there is a need to motion compensate and translate the points to a consistent coordinate frame and timestamp respectively. However, highly dynamic objects pose a unique challenge, as they can appear at different timestamps in each sensor's data. Without knowing the speed of the objects, their position appears to be different in different sensor outputs. Thus, even after motion compensation, highly dynamic objects are not matched from multiple sensors in the same frame, and human annotators struggle to add unique bounding boxes that capture all objects. This article focuses on addressing this challenge, primarily within the context of Scania collected datasets. The proposed solution takes a track of an annotated object as input and uses the Moving Horizon Estimation (MHE) to robustly estimate its speed. The estimated speed profile is utilized to correct the position of the annotated box and add boxes to object clusters missed by the original annotation.

Addressing Data Annotation Challenges in Multiple Sensors: A Solution for Scania Collected Datasets

TL;DR

The paper tackles data annotation challenges in multi-sensor autonomous-vehicle datasets for heavy vehicles by modeling non-ego object motion with a planar 2D state along a trajectory of headings and using constant-acceleration dynamics. A Moving Horizon Estimation (MHE) framework processes a track of noisy human annotations to produce robust speed estimates over a horizon , which are then used to speed-compensate sensor points and refine 3D bounding boxes. The refinement pipeline clusters views by heading, generates pseudo bounding boxes, and assigns them to speed-compensated clusters, enabling better coverage of dynamic objects across sensors. Validation on real Scania truck/bus data shows smoother speed trajectories and improved annotation completeness, with potential to enhance ground-truth quality for downstream perception and evaluation tasks, and the approach can be extended to additional object classes and longer sequences.

Abstract

Data annotation in autonomous vehicles is a critical step in the development of Deep Neural Network (DNN) based models or the performance evaluation of the perception system. This often takes the form of adding 3D bounding boxes on time-sequential and registered series of point-sets captured from active sensors like Light Detection and Ranging (LiDAR) and Radio Detection and Ranging (RADAR). When annotating multiple active sensors, there is a need to motion compensate and translate the points to a consistent coordinate frame and timestamp respectively. However, highly dynamic objects pose a unique challenge, as they can appear at different timestamps in each sensor's data. Without knowing the speed of the objects, their position appears to be different in different sensor outputs. Thus, even after motion compensation, highly dynamic objects are not matched from multiple sensors in the same frame, and human annotators struggle to add unique bounding boxes that capture all objects. This article focuses on addressing this challenge, primarily within the context of Scania collected datasets. The proposed solution takes a track of an annotated object as input and uses the Moving Horizon Estimation (MHE) to robustly estimate its speed. The estimated speed profile is utilized to correct the position of the annotated box and add boxes to object clusters missed by the original annotation.
Paper Structure (13 sections, 7 equations, 6 figures)

This paper contains 13 sections, 7 equations, 6 figures.

Figures (6)

  • Figure 1: Illustration of the Scania truck with different sensor placement highlighted with red circles, while the $i^{th}$ sensor and the vehicle coordinate frame are shown as $\mathbf{L_i}$ and $\mathbf{V}$ respectively.
  • Figure 2: Point cloud from three sensors after compensating for ego motion, (red, violet and blue points are from s $L_1$, $L_2$ and $L_3$ respectively). The dynamic object in the scene is observed by different sensors in different time stances, with displacement of 2 m within 100 ms. The manually annotated 3D bounding box is shown in green.
  • Figure 3: The annotation process for Scania collected dataset. Scans from multiple s are motion compensated and accumulated in a superframe. Thereafter a time-series of superframes are post processed and sent for manual annotation. A snapshot of a superframe before and after annotation is shown. The ego vehicle is marked in the center with vehicle frame $\mathbf{V}$, and sensor frames $\mathbf{L}_1$ to $\mathbf{L}_4$. The point clouds from multiple s are colored according to their offset from motion compensated timestamp (chosen to be the middle of the superframe). Red and blue indicate beginning and end of the superframe respectively. The annotated vehicle is shown with a green box.
  • Figure 4: Proposed approach for refining multi- annotations. The input is time-sequential track of human labelled annotation (shown with green box) alongside it's speed estimate obtained using . Clustering along heading captures red points, which represent a view of the vehicle missed by the human annotated box. Next, the speed estimate is used to shift the points according to the difference between $\tau_{i,j}$ and $t^*$. The red points move forward, whereas the blue points move back. Thirdly, the annotated box is moved to align with the shifted points. Lastly, the green box is duplicated for each cluster, and both the points and pseudo bounding boxes are shifted back according to the reverse cluster displacement. The pseudo bounding boxes are colored according to their best fitting clusters.
  • Figure 5: Estimation of speed for four different non-ego vehicles across various logs. The blue plot represents the estimates, whereas the orange plot denotes the naive speed estimate, obtained by dividing the distances and times in between the annotation intervals.
  • ...and 1 more figures