Augmenting train maintenance technicians with automated incident diagnostic suggestions

Georges Tod; Jean Bruggeman; Evert Bevernage; Pieter Moelans; Walter Eeckhout; Jean-Luc Glineur

Augmenting train maintenance technicians with automated incident diagnostic suggestions

Georges Tod, Jean Bruggeman, Evert Bevernage, Pieter Moelans, Walter Eeckhout, Jean-Luc Glineur

TL;DR

The paper addresses the need for rapid, reliable diagnostic support for train incidents by formulating incident causes as a multiclass classification task: $y=f(x)$ where $x=[x_1,\dots,x_p]$ is a sequence of on-board events and $y$ is the implicated physical subsystem. It proposes a cloud-based platform that ingests incident data, extracts recurrent event sets via feature engineering (including Longest-Common Sub Sequence mining), and uses a cascaded ensemble of Naive Bayes classifiers across time windows to generate near real-time diagnostic suggestions, with a human-in-the-loop feedback loop for continual improvement. Key contributions include (1) the automated diagnostics platform architecture, (2) a two-stage feature engineering pipeline that yields discrete event sets, and (3) a novel discrete set classifier that leverages cascading windows to balance accuracy and coverage. The approach demonstrates competitive predictive performance across multiple fleets, offering actionable explanations through the extracted event sets and enabling faster prioritization of repairs, with promising directions toward predictive maintenance alerts and edge deployment.

Abstract

Train operational incidents are so far diagnosed individually and manually by train maintenance technicians. In order to assist maintenance crews in their responsiveness and task prioritization, a learning machine is developed and deployed in production to suggest diagnostics to train technicians on their phones, tablets or laptops as soon as a train incident is declared. A feedback loop allows to take into account the actual diagnose by designated train maintenance experts to refine the learning machine. By formulating the problem as a discrete set classification task, feature engineering methods are proposed to extract physically plausible sets of events from traces generated on-board railway vehicles. The latter feed an original ensemble classifier to class incidents by their potential technical cause. Finally, the resulting model is trained and validated using real operational data and deployed on a cloud platform. Future work will explore how the extracted sets of events can be used to avoid incidents by assisting human experts in the creation predictive maintenance alerts.

Augmenting train maintenance technicians with automated incident diagnostic suggestions

TL;DR

The paper addresses the need for rapid, reliable diagnostic support for train incidents by formulating incident causes as a multiclass classification task:

where

is a sequence of on-board events and

is the implicated physical subsystem. It proposes a cloud-based platform that ingests incident data, extracts recurrent event sets via feature engineering (including Longest-Common Sub Sequence mining), and uses a cascaded ensemble of Naive Bayes classifiers across time windows to generate near real-time diagnostic suggestions, with a human-in-the-loop feedback loop for continual improvement. Key contributions include (1) the automated diagnostics platform architecture, (2) a two-stage feature engineering pipeline that yields discrete event sets, and (3) a novel discrete set classifier that leverages cascading windows to balance accuracy and coverage. The approach demonstrates competitive predictive performance across multiple fleets, offering actionable explanations through the extracted event sets and enabling faster prioritization of repairs, with promising directions toward predictive maintenance alerts and edge deployment.

Abstract

Paper Structure (14 sections, 4 equations, 6 figures, 1 table)

This paper contains 14 sections, 4 equations, 6 figures, 1 table.

Introduction
Related works
Methodology
An automated diagnostics platform
Data sources and characteristics
Cloud platform
Feedback loop
A discrete set classification algorithm
Feature engineering
Filtering features
Extracting sets of features
Classification
Results and discussion
Conclusions and future work

Figures (6)

Figure 1: Platform to assist train maintenance technicians by automatically suggesting incident diagnostics of railway vehicles. Railway vehicles central on-board computers report about their states to a cloud storage. By Extracting-Transforming and Loading (ETL) the raw data, structured data is fed into a Data Lakehouse. Iterable machine learning models leverage iterable features to analyze large volumes of data and deliver online dashboards to assist train maintenance technicians. A loop allows to take into account the feedback from designated train maintenance experts diagnostics to refine both the training data and the models.
Figure 2: Feature engineering: (1) events are filtered based on a relevance metric $r$ and a One-at-a-time (OaT) procedure. In (2), events sets are mined based on Longest-Common Sub Sequences (LCSS). The latter are fed in (3) to the proposed ensemble classifier.
Figure 3: Hyperparameter exploration results. In (a) individual features performances for the One-at-a-time (OaT) procedure. The fraction of explained samples is the mean ratio of number of classified samples over the total number of samples on the 10 folds of the stratified cross validation. In (b), the performance of single versus the proposed ensemble classifier. The number of explained samples is the mean number of classified samples on the 10 folds of the stratified cross validation. Interestingly, the larger the window, the more the $F_1$-score drops: meaning the further in the past the model looks at, the less it can leverage the additional data. Any event happening earlier than four hours before an incident is not taken into account as classifiers' performance is considered too low in terms of $F_1$-score.
Figure 4: Ensemble classifier architecture: the proposal is based on cascaded time windows. The first classifier to answer fixes the output, which means the collective decision process assumes the first classifier to answer is the most performant one.
Figure 5: Learning machine performance: descriptive (red) and predictive (blue) performances across three different fleets (AM08, HLE18 and M7): typically the $F_1$-score is high. Nevertheless some incidents are poorly classified even during training. The M7 is a very recent fleet which explains why there is less data.
...and 1 more figures

Augmenting train maintenance technicians with automated incident diagnostic suggestions

TL;DR

Abstract

Augmenting train maintenance technicians with automated incident diagnostic suggestions

Authors

TL;DR

Abstract

Table of Contents

Figures (6)