Table of Contents
Fetching ...

A YOLO-Based Semi-Automated Labeling Approach to Improve Fault Detection Efficiency in Railroad Videos

Dylan Lester, James Gao, Samuel Sutphin, Pingping Zhu, Husnu Narman, Ammar Alzarrad

TL;DR

The paper addresses the high cost and error risk of manual labeling for railroad fault detection in videos. It proposes a semi-automated labeling workflow using $YOLOv8$, starting from a small labeled set and iteratively correcting predictions to expand labeled data with less human effort. Empirical results show that assisted labeling improves the $F_1$-score from about 0.81 to 0.87 on a 400-image augmented dataset and dramatically reduces labeling time (roughly 10 hours to 4–5 hours) while maintaining competitive performance near the fully manual baseline. This approach offers a scalable, cost-effective labeling paradigm for fault detection tasks and can extend to other detection domains with similar labeling bottlenecks.

Abstract

Manual labeling for large-scale image and video datasets is often time-intensive, error-prone, and costly, posing a significant barrier to efficient machine learning workflows in fault detection from railroad videos. This study introduces a semi-automated labeling method that utilizes a pre-trained You Only Look Once (YOLO) model to streamline the labeling process and enhance fault detection accuracy in railroad videos. By initiating the process with a small set of manually labeled data, our approach iteratively trains the YOLO model, using each cycle's output to improve model accuracy and progressively reduce the need for human intervention. To facilitate easy correction of model predictions, we developed a system to export YOLO's detection data as an editable text file, enabling rapid adjustments when detections require refinement. This approach decreases labeling time from an average of 2 to 4 minutes per image to 30 seconds to 2 minutes, effectively minimizing labor costs and labeling errors. Unlike costly AI based labeling solutions on paid platforms, our method provides a cost-effective alternative for researchers and practitioners handling large datasets in fault detection and other detection based machine learning applications.

A YOLO-Based Semi-Automated Labeling Approach to Improve Fault Detection Efficiency in Railroad Videos

TL;DR

The paper addresses the high cost and error risk of manual labeling for railroad fault detection in videos. It proposes a semi-automated labeling workflow using , starting from a small labeled set and iteratively correcting predictions to expand labeled data with less human effort. Empirical results show that assisted labeling improves the -score from about 0.81 to 0.87 on a 400-image augmented dataset and dramatically reduces labeling time (roughly 10 hours to 4–5 hours) while maintaining competitive performance near the fully manual baseline. This approach offers a scalable, cost-effective labeling paradigm for fault detection tasks and can extend to other detection domains with similar labeling bottlenecks.

Abstract

Manual labeling for large-scale image and video datasets is often time-intensive, error-prone, and costly, posing a significant barrier to efficient machine learning workflows in fault detection from railroad videos. This study introduces a semi-automated labeling method that utilizes a pre-trained You Only Look Once (YOLO) model to streamline the labeling process and enhance fault detection accuracy in railroad videos. By initiating the process with a small set of manually labeled data, our approach iteratively trains the YOLO model, using each cycle's output to improve model accuracy and progressively reduce the need for human intervention. To facilitate easy correction of model predictions, we developed a system to export YOLO's detection data as an editable text file, enabling rapid adjustments when detections require refinement. This approach decreases labeling time from an average of 2 to 4 minutes per image to 30 seconds to 2 minutes, effectively minimizing labor costs and labeling errors. Unlike costly AI based labeling solutions on paid platforms, our method provides a cost-effective alternative for researchers and practitioners handling large datasets in fault detection and other detection based machine learning applications.

Paper Structure

This paper contains 11 sections, 2 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: A diagram of the algorithm.
  • Figure 2: F1-score of the 100 image set.
  • Figure 3: F1-score of the 200 image set.
  • Figure 4: F1-score of the 300 image set.
  • Figure 5: F1-score of the 400 image set.
  • ...and 2 more figures