Table of Contents
Fetching ...

Improving Image Data Leakage Detection in Automotive Software

Md Abu Ahammed Babu, Sushant Kumar Pandey, Darko Durisic, Ashok Chaitanya Koppisetty, Miroslaw Staron

TL;DR

This study conducts a computational experiment on the Cirrus dataset from industrial partner Volvo Cars to develop a method for detecting data leakage and evaluates the method on another public dataset, Kitti, which is a popular and widely accepted benchmark dataset in the automotive domain.

Abstract

Data leakage is a very common problem that is often overlooked during splitting data into train and test sets before training any ML/DL model. The model performance gets artificially inflated with the presence of data leakage during the evaluation phase which often leads the model to erroneous prediction on real-time deployment. However, detecting the presence of such leakage is challenging, particularly in the object detection context of perception systems where the model needs to be supplied with image data for training. In this study, we conduct a computational experiment on the Cirrus dataset from our industrial partner Volvo Cars to develop a method for detecting data leakage. We then evaluate the method on another public dataset, Kitti, which is a popular and widely accepted benchmark dataset in the automotive domain. The results show that thanks to our proposed method we are able to detect data leakage in the Kitti dataset, which was previously unknown.

Improving Image Data Leakage Detection in Automotive Software

TL;DR

This study conducts a computational experiment on the Cirrus dataset from industrial partner Volvo Cars to develop a method for detecting data leakage and evaluates the method on another public dataset, Kitti, which is a popular and widely accepted benchmark dataset in the automotive domain.

Abstract

Data leakage is a very common problem that is often overlooked during splitting data into train and test sets before training any ML/DL model. The model performance gets artificially inflated with the presence of data leakage during the evaluation phase which often leads the model to erroneous prediction on real-time deployment. However, detecting the presence of such leakage is challenging, particularly in the object detection context of perception systems where the model needs to be supplied with image data for training. In this study, we conduct a computational experiment on the Cirrus dataset from our industrial partner Volvo Cars to develop a method for detecting data leakage. We then evaluate the method on another public dataset, Kitti, which is a popular and widely accepted benchmark dataset in the automotive domain. The results show that thanks to our proposed method we are able to detect data leakage in the Kitti dataset, which was previously unknown.

Paper Structure

This paper contains 13 sections, 1 equation, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: An example of target leakage through similar images present in both train and test datasets
  • Figure 2: Illustration of the data leakage steps.
  • Figure 3: Results summary graph
  • Figure 4: The relative performance increase rate
  • Figure 5: Evaluation results summary on kitti
  • ...and 4 more figures