Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness

Yang Zhang; Mingying Li; Huilin Pan; Moyun Liu; Yang Zhou

Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness

Yang Zhang, Mingying Li, Huilin Pan, Moyun Liu, Yang Zhou

TL;DR

The paper tackles visual fault detection for freight trains by designing NAS FTI-FDet, an efficient NAS framework that searches for a task-specific, multi-scale detection head. It introduces a scale-aware search space and a representation-sharing scheme to handle large receptive-field variations while reducing memory and search time, and it demonstrates data-volume robustness by achieving competitive accuracy with reduced datasets. Empirically, the method attains 46.8 mAP on Bottom View and 47.9 mAP on Side View, outperforming several hand-crafted and NAS-based baselines, with linear reductions in search cost as data volume decreases. The work suggests practical benefits for industrial settings with limited data and resources, and discusses extensions to illumination robustness and backbone-aware NAS.

Abstract

Deep learning-based fault detection methods have achieved significant success. In visual fault detection of freight trains, there exists a large characteristic difference between inter-class components (scale variance) but intra-class on the contrary, which entails scale-awareness for detectors. Moreover, the design of task-specific networks heavily relies on human expertise. As a consequence, neural architecture search (NAS) that automates the model design process gains considerable attention because of its promising performance. However, NAS is computationally intensive due to the large search space and huge data volume. In this work, we propose an efficient NAS-based framework for visual fault detection of freight trains to search for the task-specific detection head with capacities of multi-scale representation. First, we design a scale-aware search space for discovering an effective receptive field in the head. Second, we explore the robustness of data volume to reduce search costs based on the specifically designed search space, and a novel sharing strategy is proposed to reduce memory and further improve search efficiency. Extensive experimental results demonstrate the effectiveness of our method with data volume robustness, which achieves 46.8 and 47.9 mAP on the Bottom View and Side View datasets, respectively. Our framework outperforms the state-of-the-art approaches and linearly decreases the search costs with reduced data volumes.

Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness

TL;DR

Abstract

Paper Structure (28 sections, 8 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 28 sections, 8 equations, 8 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Fault Detection for Freight Train Images
General Object Detection
Neural Architecture Search for Object Detection
Proposed Method
Overall Framework
Design of Search Space
Representation Sharing
Optimization
Experiments
Experiments Setup
Datasets
Reduced Dataset with Random Sampling
Evaluation Metrics
...and 13 more sections

Figures (8)

Figure 1: Visual results of models searched on the Bottom View datasets with various volumes. All models exhibit competitive responses in the class response maps and achieve precise localization.
Figure 2: An overview of our proposed search framework for visual fault detection of freight trains. Our method focuses on searching for the optimal head of detectors. The searchable head is constructed by two groups of cells. The edge linking nodes within a cell are composed of two 1$\times$1 convolutions and a search space between the two. The search space contains multiple operations, allowing the edge to search for proper receptive field combinations. Both cell structures and operations on the edges are searchable.
Figure 3: Representation sharing. Each $p_{i}$ indicates a path in which the solid line represents an operation. Each sphere lying path denotes intermediate representations. Large filters are firstly factorized into many 3 $\times$ 3 filters, and then those intermediate representations at the same RF level are shared.
Figure 4: Image acquirement in the wild. (a) Side view. (b) Bottom view.
Figure 5: Side View and Bottom View datasets. The top row figures with green ground truth denote normal states for train components, and the bottom row with red ground truth denotes fault states. It is noteworthy that these train components have large-scale variations.
...and 3 more figures

Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness

TL;DR

Abstract

Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness

Authors

TL;DR

Abstract

Table of Contents

Figures (8)