Causal Intervention Sequence Analysis for Fault Tracking in Radio Access Networks
Chenhua Shi, Joji Philip, Subhadip Bandyopadhyay, Jayanta Choudhury
TL;DR
The paper addresses SLA breach fault tracking in RAN using millisecond-scale telemetry, where traditional coarse-grained approaches fail to reveal both root-cause indicators and their causal order. It introduces a three-component pipeline—Root-Cause Discovery (RCD), causal subgraph analysis, and deviation detection—to identify intervention sequences leading to SLA violations, leveraging KS tests and Z-scores for temporal ordering. Monte Carlo simulations show that the approach yields convergent estimates of causal-source probabilities and identifies reliable KPIs, while the method remains CPU-friendly and scalable for edge deployments. Overall, the framework enables proactive fault prevention by delivering high-resolution, causally ordered insights that are directly actionable for network operators.
Abstract
To keep modern Radio Access Networks (RAN) running smoothly, operators need to spot the real-world triggers behind Service-Level Agreement (SLA) breaches well before customers feel them. We introduce an AI/ML pipeline that does two things most tools miss: (1) finds the likely root-cause indicators and (2) reveals the exact order in which those events unfold. We start by labeling network data: records linked to past SLA breaches are marked `abnormal', and everything else `normal'. Our model then learns the causal chain that turns normal behavior into a fault. In Monte Carlo tests the approach pinpoints the correct trigger sequence with high precision and scales to millions of data points without loss of speed. These results show that high-resolution, causally ordered insights can move fault management from reactive troubleshooting to proactive prevention.
