Table of Contents
Fetching ...

Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection

Kadir-Kaan Özer, René Ebeling, Markus Enzweiler

Abstract

Multivariate time series anomalies often manifest as shifts in cross-channel dependencies rather than simple amplitude excursions. In autonomous driving, for instance, a steering command might be internally consistent but decouple from the resulting lateral acceleration. Residual-based detectors can miss such anomalies when flexible sequence models still reconstruct signals plausibly despite altered coordination. We introduce AxonAD, an unsupervised detector that treats multi-head attention query evolution as a short horizon predictable process. A gradient-updated reconstruction pathway is coupled with a history-only predictor that forecasts future query vectors from past context. This is trained via a masked predictor-target objective against an exponential moving average (EMA) target encoder. At inference, reconstruction error is combined with a tail-aggregated query mismatch score, which measures cosine deviation between predicted and target queries on recent timesteps. This dual approach provides sensitivity to structural dependency shifts while retaining amplitude-level detection. On proprietary in-vehicle telemetry with interval annotations and on the TSB-AD multi-variate suite (17 datasets, 180 series) with threshold-free and range-aware metrics, AxonAD improves ranking quality and temporal localization over strong baselines. Ablations confirm that query prediction and combined scoring are the primary drivers of the observed gains. Code is available at the URL https://github.com/iis-esslingen/AxonAD.

Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection

Abstract

Multivariate time series anomalies often manifest as shifts in cross-channel dependencies rather than simple amplitude excursions. In autonomous driving, for instance, a steering command might be internally consistent but decouple from the resulting lateral acceleration. Residual-based detectors can miss such anomalies when flexible sequence models still reconstruct signals plausibly despite altered coordination. We introduce AxonAD, an unsupervised detector that treats multi-head attention query evolution as a short horizon predictable process. A gradient-updated reconstruction pathway is coupled with a history-only predictor that forecasts future query vectors from past context. This is trained via a masked predictor-target objective against an exponential moving average (EMA) target encoder. At inference, reconstruction error is combined with a tail-aggregated query mismatch score, which measures cosine deviation between predicted and target queries on recent timesteps. This dual approach provides sensitivity to structural dependency shifts while retaining amplitude-level detection. On proprietary in-vehicle telemetry with interval annotations and on the TSB-AD multi-variate suite (17 datasets, 180 series) with threshold-free and range-aware metrics, AxonAD improves ranking quality and temporal localization over strong baselines. Ablations confirm that query prediction and combined scoring are the primary drivers of the observed gains. Code is available at the URL https://github.com/iis-esslingen/AxonAD.
Paper Structure (19 sections, 11 equations, 7 figures, 7 tables)

This paper contains 19 sections, 11 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Query predictability in 2D query space (schematic, single head). Gray: past query trajectory. Blue dashed: predicted query. Red: EMA target query. (a) Nominal: predictor and target agree. (b) Coordination anomaly: the target query diverges from the predicted trajectory, producing large $d_q$ even when per-channel amplitudes are within normal bounds.
  • Figure 2: AxonAD overview. The online reconstruction encoder computes self attention using queries $\mathbf{Q}_{\mathrm{rec}}$. In parallel, a history-only predictor forecasts $\widehat{\mathbf{Q}}_{\mathrm{pred}}$ and is trained to match EMA target queries $\mathbf{Q}_{\mathrm{tgt}}$ (stop-gradient). Query mismatch on the last valid timesteps yields $d_q$, and reconstruction yields $d_{\mathrm{rec}}$. Attention divergence (KL tail) is not included in the default scoring pipeline.
  • Figure 3: Score complementarity (schematic). Nominal windows spread along both axes but cluster near the origin in both scores simultaneously. Near-boundary amplitude and coordination anomalies are moderate on both axes, falling inside both single-axis thresholds (dotted lines) but separated by the additive $S$ (dashed diagonal).
  • Figure 4: Chronological split and anomaly onset for the proprietary telemetry stream.
  • Figure 5: Paired AUC-PR comparison on TSB-AD multivariate ($n=180$). Left: win/loss counts. Right: median paired difference with lollipop connectors from zero. All paired Wilcoxon tests yield $p < 10^{-4}$ with entirely negative 95% bootstrap CIs (full statistics in the Appendix).
  • ...and 2 more figures