Extracting and Analyzing Rail Crossing Behavior Signatures from Videos using Tensor Methods

Dawon Ahn; Het Patel; Aemal Khattak; Jia Chen; Evangelos E. Papalexakis

Extracting and Analyzing Rail Crossing Behavior Signatures from Videos using Tensor Methods

Dawon Ahn, Het Patel, Aemal Khattak, Jia Chen, Evangelos E. Papalexakis

TL;DR

This work proposes a multi-view tensor decomposition framework that captures behavioral similarities across three temporal phases: Approach (warning activation to gate lowering), Waiting (gates down to train passage), and Clearance (train passage to gate raising).

Abstract

Railway crossings present complex safety challenges where driver behavior varies by location, time, and conditions. Traditional approaches analyze crossings individually, limiting the ability to identify shared behavioral patterns across locations. We propose a multi-view tensor decomposition framework that captures behavioral similarities across three temporal phases: Approach (warning activation to gate lowering), Waiting (gates down to train passage), and Clearance (train passage to gate raising). We analyze railway crossing videos from multiple locations using TimeSformer embeddings to represent each phase. By constructing phase-specific similarity matrices and applying non-negative symmetric CP decomposition, we discover latent behavioral components with distinct temporal signatures. Our tensor analysis reveals that crossing location appears to be a stronger determinant of behavior patterns than time of day, and that approach-phase behavior provides particularly discriminative signatures. Visualization of the learned component space confirms location-based clustering, with certain crossings forming distinct behavioral clusters. This automated framework enables scalable pattern discovery across multiple crossings, providing a foundation for grouping locations by behavioral similarity to inform targeted safety interventions.

Extracting and Analyzing Rail Crossing Behavior Signatures from Videos using Tensor Methods

TL;DR

Abstract

Paper Structure (19 sections, 4 equations, 10 figures)

This paper contains 19 sections, 4 equations, 10 figures.

Introduction
Related Work
Methodology
Overview
Phase Annotation
Video Embedding Extraction
Multi-View Tensor Construction
Rank Selection
CORCONDIA (Core Consistency Diagnostic)
Reconstruction Error
Holdout Validation
Non-Negative Symmetric CP Decomposition
Results
Dataset and Distribution
Multi-View Similarity Tensor
...and 4 more sections

Figures (10)

Figure 1: Overview of the proposed multi-view tensor decomposition pipeline. (1) Each crossing event video is segmented into three behavioral phases (Approach, Waiting, Clearance) and processed through TimeSformer to extract 768-dimensional embeddings per phase. (2) For each phase $p$, we construct a $31 \times 31$ similarity matrix by computing pairwise cosine similarity between video embeddings, then stack the three matrices to form a tensor $X \in \mathbb{R}^{31\times31\times3}$. (3) Symmetric non-negative CP decomposition factorizes the tensor into phase loadings $a_r \in \mathbb{R}^P$ and event loadings $u_r \in \mathbb{R}^N$, revealing latent behavioral components across videos and phases.
Figure 2: CORCONDIA diagnostic for rank selection. CORCONDIA measures core consistency, where values $\ge$ 80% (orange line) indicate acceptable CP structure and $\approx$ 100% (green line) indicates a perfect fit. Rank 2 shows severe structural issues (-109.9%), while ranks 1 and 3 demonstrate a valid CP structure. Only valid for R $\leq$ min(I,J,K) = 3.
Figure 3: Reconstruction error (sum of squared errors) across ranks 1-10. The curve shows a clear elbow between ranks 3-5 (shaded region), indicating diminishing returns beyond rank 3. Values shown in thousands (K)
Figure 4: Holdout validation error (RMSE) with 10% masked entries across ranks 1-10. Lower values indicate better generalization. Error decreases monotonically with rank, with substantial improvements up to rank 5. Proper masking prevents information leakage during training (5 random restarts per trial, averaged over 3 trials).
Figure 5: Reading a CP decomposition component. Component magnitude ($\lambda_r$) indicates overall importance. Phase loadings ($a_r$) reveal which phases (Approach, Waiting, Clearance) define the behavioral pattern. Video loadings ($u_r$) identify which crossing events exhibit this pattern. The example component is Approach-dominant (0.82) and characterizes videos 5, 12, and 23.
...and 5 more figures

Extracting and Analyzing Rail Crossing Behavior Signatures from Videos using Tensor Methods

TL;DR

Abstract

Extracting and Analyzing Rail Crossing Behavior Signatures from Videos using Tensor Methods

Authors

TL;DR

Abstract

Table of Contents

Figures (10)