Table of Contents
Fetching ...

Object-Size-Driven Design of Convolutional Neural Networks: Virtual Axle Detection based on Raw Data

Henik Riedel, Robert Steven Lorenzen, Clemens Hübler

TL;DR

The paper tackles real-time axle detection for Bridge Weigh-In-Motion (BWIM) without dedicated axle detectors by introducing VADER, a U-Net–based FCN that operates on raw acceleration data, and the Maximum Receptive Field (MRF) rule to constrain hyperparameters based on the bridge's fundamental frequency. It demonstrates that using raw data yields dramatic gains in speed (≈65× faster) and memory (≈1% of the raw-input footprint of spectrograms) while maintaining high detection accuracy, achieving up to 99.9% axle detection with a mean spatial error around 3.69 cm in favorable sensor conditions. The work compares raw data to spectrogram-based inputs across stratified and DGPS labeling scenarios, showing that raw-input models generalize better and are more robust to sensor degradation. Beyond axle detection, the MRF rule provides a theoretically grounded approach to hyperparameter tuning that could apply to other unstructured data problems, potentially reducing the need for extensive hyperparameter searches in diverse domains.

Abstract

As infrastructure ages, the need for efficient monitoring methods becomes increasingly critical. Bridge Weigh-In-Motion (BWIM) systems are crucial for cost-effective determination of loads and, consequently, the residual service life of road and railway infrastructure. However, conventional BWIM systems require additional sensors for axle detection, which must be installed in potentially inaccessible locations or places that interfere with bridge operation. This study presents a novel approach for real-time detection of train axles using sensors arbitrarily placed on bridges, providing an alternative to dedicated axle detectors. The developed Virtual Axle Detector with Enhanced Receptive Field (VADER) has been validated on a single-track railway bridge using only acceleration measurements, detecting 99.9% of axles with a spatial error of 3.69cm. Using raw data as input outperformed the state-of-the-art spectrogram-based method in both speed and memory usage by 99%, thereby making real-time application feasible for the first time. Additionally, we introduce the Maximum Receptive Field (MRF) rule, a novel approach to optimise hyperparameters of Convolutional Neural Networks (CNNs) based on the size of objects. In this context, the object size relates to the fundamental frequency of a bridge. The MRF rule effectively narrows the hyperparameter search space, overcoming the need for extensive hyperparameter tuning. Since the MRF rule can theoretically be applied to all unstructured data, it could have implications for a wide range of deep learning problems, from earthquake prediction to object recognition.

Object-Size-Driven Design of Convolutional Neural Networks: Virtual Axle Detection based on Raw Data

TL;DR

The paper tackles real-time axle detection for Bridge Weigh-In-Motion (BWIM) without dedicated axle detectors by introducing VADER, a U-Net–based FCN that operates on raw acceleration data, and the Maximum Receptive Field (MRF) rule to constrain hyperparameters based on the bridge's fundamental frequency. It demonstrates that using raw data yields dramatic gains in speed (≈65× faster) and memory (≈1% of the raw-input footprint of spectrograms) while maintaining high detection accuracy, achieving up to 99.9% axle detection with a mean spatial error around 3.69 cm in favorable sensor conditions. The work compares raw data to spectrogram-based inputs across stratified and DGPS labeling scenarios, showing that raw-input models generalize better and are more robust to sensor degradation. Beyond axle detection, the MRF rule provides a theoretically grounded approach to hyperparameter tuning that could apply to other unstructured data problems, potentially reducing the need for extensive hyperparameter searches in diverse domains.

Abstract

As infrastructure ages, the need for efficient monitoring methods becomes increasingly critical. Bridge Weigh-In-Motion (BWIM) systems are crucial for cost-effective determination of loads and, consequently, the residual service life of road and railway infrastructure. However, conventional BWIM systems require additional sensors for axle detection, which must be installed in potentially inaccessible locations or places that interfere with bridge operation. This study presents a novel approach for real-time detection of train axles using sensors arbitrarily placed on bridges, providing an alternative to dedicated axle detectors. The developed Virtual Axle Detector with Enhanced Receptive Field (VADER) has been validated on a single-track railway bridge using only acceleration measurements, detecting 99.9% of axles with a spatial error of 3.69cm. Using raw data as input outperformed the state-of-the-art spectrogram-based method in both speed and memory usage by 99%, thereby making real-time application feasible for the first time. Additionally, we introduce the Maximum Receptive Field (MRF) rule, a novel approach to optimise hyperparameters of Convolutional Neural Networks (CNNs) based on the size of objects. In this context, the object size relates to the fundamental frequency of a bridge. The MRF rule effectively narrows the hyperparameter search space, overcoming the need for extensive hyperparameter tuning. Since the MRF rule can theoretically be applied to all unstructured data, it could have implications for a wide range of deep learning problems, from earthquake prediction to object recognition.
Paper Structure (14 sections, 8 equations, 11 figures, 3 tables)

This paper contains 14 sections, 8 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Bridge and sensor setup: (a) side view; (b) top view with sensor labels, accelerometers $x$-ordinate and strain gauge distances; (c) cross section. Lorenzen2022VirtualAD
  • Figure 2: Measured acceleration signals of sensors L1, L3, and L5 (Fig. \ref{['F:BridgeSensorsetup']}) with corresponding outputs of a VADER model. The labels represent the time point when a train axle is above the respective sensor. The axle and the corresponding label are shown in red for the first and in blue for the second axle. Measured acceleration signals are shown with the corresponding model outputs. Labels, bridge, sensors, and train are shown in simplified versions. The models generally output probabilities between zero and one. If the model is uncertain, a high probability for a passing axle is predicted over several time steps. To compare the model predictions with the labels, peak picking was used to select the time point when the model is most certain. A peak is only picked if the model's confidence is at least 25 % and there is a minimum of 20 samples ($0.0\overline{3}$ seconds) between consecutive peaks.
  • Figure 3: Histogram of train lengths defined by number of axles for the first fold of both scenarios using five-fold cross-validation.
  • Figure 4: Architecture of the U-Net-based VADER model. The network follows a symmetric encoder-decoder structure, where the left side represents the contracting path (encoder) with multiple convolutional layers (CB in light purple or RB in yellow) followed by max pooling (red) for down-sampling. The right side depicts the expansive path (decoder) with up-sampling operations (transposed convolution in blue), convolutional layers, and concatenation (green) with corresponding layers from the encoder via skip connections (purple arrow). The final layer applies a convolution with sigmoid activation (purple) to generate the output. The skip connections ensure high-resolution features from the encoder are preserved and integrated into the decoder. The boxes qualitatively represent the size of the outputs from the corresponding layer (feature maps) with $T$ for samples and $m$ for max pooling size at the bottom right, feature maps at the bottom and frequencies at the left. Three of the hyperparameter combinations examined were presented as examples.
  • Figure 5: Maximum receptive fields as a function of kernel size, max pooling size, and pooling steps for all used combinations of the hyperparameter study.
  • ...and 6 more figures