Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms

Sungchul Hong; Yunjin Choi; Jong-June Jeon

Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms

Sungchul Hong, Yunjin Choi, Jong-June Jeon

TL;DR

This paper tackles interpretable water level forecasting by embedding spatiotemporal causality into a transformer-based model, InstaTran. It introduces SCAN and TAN within a multilayer spatiotemporal causal graph, controlled by masks $M_{\\mathcal{S}}$ and $M_{\\mathcal{T}}$, and uses a direct multi-quantile decoder trained with composite quantile loss $CQL$. The key contributions include: (i) a principled masking scheme that yields interpretable attention weights aligned with domain knowledge, (ii) a spatiotemporal encoder that strengthens spatial causality, and (iii) robustness to distribution shifts demonstrated on the Han River dataset and US lake data. The approach delivers competitive probabilistic forecasts with interpretable variable importances, and improves generalization under covariate shift, offering practical utility for water resource management and disaster risk mitigation.

Abstract

Accurate forecasting of river water levels is vital for effectively managing traffic flow and mitigating the risks associated with natural disasters. This task presents challenges due to the intricate factors influencing the flow of a river. Recent advances in machine learning have introduced numerous effective forecasting methods. However, these methods lack interpretability due to their complex structure, resulting in limited reliability. Addressing this issue, this study proposes a deep learning model that quantifies interpretability, with an emphasis on water level forecasting. This model focuses on generating quantitative interpretability measurements, which align with the common knowledge embedded in the input data. This is facilitated by the utilization of a transformer architecture that is purposefully designed with masking, incorporating a multi-layer network that captures spatiotemporal causation. We perform a comparative analysis on the Han River dataset obtained from Seoul, South Korea, from 2016 to 2021. The results illustrate that our approach offers enhanced interpretability consistent with common knowledge, outperforming competing methods and also enhances robustness against distribution shift.

Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms

TL;DR

and

, and uses a direct multi-quantile decoder trained with composite quantile loss

. The key contributions include: (i) a principled masking scheme that yields interpretable attention weights aligned with domain knowledge, (ii) a spatiotemporal encoder that strengthens spatial causality, and (iii) robustness to distribution shifts demonstrated on the Han River dataset and US lake data. The approach delivers competitive probabilistic forecasts with interpretable variable importances, and improves generalization under covariate shift, offering practical utility for water resource management and disaster risk mitigation.

Abstract

Paper Structure (30 sections, 29 equations, 12 figures, 9 tables)

This paper contains 30 sections, 29 equations, 12 figures, 9 tables.

Introduction
Related Work
Preliminary
Dataset
Modelling Spatiotemporal Causality via Multilayer Network
Attention Mechanism
Proposed Model
Notations and Model Architecture Overview
Spatiotemporal encoder
Spatially Causal Attention Network
Temporal Attention Network
Strengthening Spatial Causal Relations
Temporal Decoder
Loss functions
Experiments
...and 15 more sections

Figures (12)

Figure 1: The satellite image of Han River, Seoul, South Korea with highlights on the area of interest. The blue points indicate the observatories at which observations are collected, and the red point denotes our target site, the Jamsu bridge $B_0$. The river flows from east to west, emptying into the sea.
Figure 2: Left: The multilayer network $\mathcal{G}$, illustrating the spatiotemporal causal relations represented for three consecutive hours: $t-1$, $t$, and $t+1$. Each layer shares the same spatial causal relations, and every node is connected to a corresponding node in the subsequent layer, belonging to the same cluster. Right: Visualization of the spatial masking matrix $M_\mathcal{S}$, with colored blocks representing encodings of connected edges.
Figure 3: Architecture of the proposed model (bottom) and details of SCAN and TAN in the spatiotemporal encoder (top).
Figure 4: Heatmaps of attention weights $A^{\mathcal{S}}_{{\bf u}'}$, disussed in \ref{['eq: scan2']}. Plots (a) and (b) on the top row correspond to a dry day, and plots (c) and (d) on the bottom row correspond to a rainy day. The variables corresponding to the indices of both the x-axis and y-axis are as follows: $0:P_1,~1:P_2, ~2:P_3,~3:\hbox{WL}~(B_4),~4:\hbox{WL}~(D),~5:\hbox{IF}~(D),~6:\hbox{STR}~(D),~7:\hbox{JUS}~(D),~8:\hbox{OF}~(D),~9:\hbox{WL}~(B_1),~10:\hbox{FL}~(B_1),~11:\hbox{WL}~(B_0),~12:\hbox{WL}~(B_2),~13:\hbox{FL}~(B_2),~14:\hbox{WL}~(B_3),~15:\hbox{FL}~(B_3)$.
Figure 5: Variable importances and observations of $P_1$ and OF $(D)$. The solid line denotes the observations of variables. The dashed and dotted lines denote the evaluated variable importances obtained from InstaTran and TFT, respectively.
...and 7 more figures

Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms

TL;DR

Abstract

Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms

Authors

TL;DR

Abstract

Table of Contents

Figures (12)