Table of Contents
Fetching ...

Hierarchical Delay Attribution Classification using Unstructured Text in Train Management Systems

Anton Borg, Per Lingvall, Martin Svensson

TL;DR

This work tackles automating Swedish railway delay attribution coding, a manually performed task across ~200 hierarchical codes with a 9-day finalization window. It adopts a TF-IDF representation of unstructured delay reports and compares Random Forest and SVM classifiers, augmented with conformal prediction, in both flat and hierarchical multi-level setups. Results show that hierarchical classification improves performance over flat approaches at Levels 2 and 3, with RF and SVM approaching, but not matching, the manual operator (TKL) performance, highlighting practical value as a decision-support tool and uncertainty estimator. The study demonstrates feasibility and offers a path toward more reliable, faster coding, while outlining directions for future enhancements such as transformer-based text representations and improved explainability to further close the gap with human experts.

Abstract

EU directives stipulate a systematic follow-up of train delays. In Sweden, the Swedish Transport Administration registers and assigns an appropriate delay attribution code. However, this delay attribution code is assigned manually, which is a complex task. In this paper, a machine learning-based decision support for assigning delay attribution codes based on event descriptions is investigated. The text is transformed using TF-IDF, and two models, Random Forest and Support Vector Machine, are evaluated against a random uniform classifier and the classification performance of the Swedish Transport Administration. Further, the problem is modeled as both a hierarchical and flat approach. The results indicate that a hierarchical approach performs better than a flat approach. Both approaches perform better than the random uniform classifier but perform worse than the manual classification.

Hierarchical Delay Attribution Classification using Unstructured Text in Train Management Systems

TL;DR

This work tackles automating Swedish railway delay attribution coding, a manually performed task across ~200 hierarchical codes with a 9-day finalization window. It adopts a TF-IDF representation of unstructured delay reports and compares Random Forest and SVM classifiers, augmented with conformal prediction, in both flat and hierarchical multi-level setups. Results show that hierarchical classification improves performance over flat approaches at Levels 2 and 3, with RF and SVM approaching, but not matching, the manual operator (TKL) performance, highlighting practical value as a decision-support tool and uncertainty estimator. The study demonstrates feasibility and offers a path toward more reliable, faster coding, while outlining directions for future enhancements such as transformer-based text representations and improved explainability to further close the gap with human experts.

Abstract

EU directives stipulate a systematic follow-up of train delays. In Sweden, the Swedish Transport Administration registers and assigns an appropriate delay attribution code. However, this delay attribution code is assigned manually, which is a complex task. In this paper, a machine learning-based decision support for assigning delay attribution codes based on event descriptions is investigated. The text is transformed using TF-IDF, and two models, Random Forest and Support Vector Machine, are evaluated against a random uniform classifier and the classification performance of the Swedish Transport Administration. Further, the problem is modeled as both a hierarchical and flat approach. The results indicate that a hierarchical approach performs better than a flat approach. Both approaches perform better than the random uniform classifier but perform worse than the manual classification.
Paper Structure (15 sections, 1 equation, 7 figures, 11 tables)

This paper contains 15 sections, 1 equation, 7 figures, 11 tables.

Figures (7)

  • Figure 1: The Hierarchical classification approach visualized. Each node C is a multi-class classifier, trained using a specific delay attribution code and its sub-classes as classification targets.
  • Figure 2: Significance plot comparing the results for a flat (, F) approach against an Hierarchical (, H) approach.
  • Figure 3: Mean F1-score for the SVM and Random Forest when classifying the first level delay attribution codes, including confidence intervals. TKL and Uniform classifier are shown as lines, with confidence interval included for TKL.
  • Figure 4: Mean F1-score for the SVM and Random Forest when classifying the second level delay attribution codes, including confidence intervals. Classification is done per parent code, e.g. models trained and evaluated on D-codes. TKL and Uniform classifier are shown as lines, with confidence interval included for TKL.
  • Figure 5: Critical Difference diagram for level 2 codes. The connecting line indicating no statistical difference between Random Forest and SVM, TKL and SVM, nor between Uniform Classifier and Random Forest. However, SVM is statistically significant better than the Uniform Classifier. Similarly, TKL performs better than both the Uniform Classifier and Random Forest.
  • ...and 2 more figures