Table of Contents
Fetching ...

Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms

Sungchul Hong, Yunjin Choi, Jong-June Jeon

TL;DR

This paper tackles interpretable water level forecasting by embedding spatiotemporal causality into a transformer-based model, InstaTran. It introduces SCAN and TAN within a multilayer spatiotemporal causal graph, controlled by masks $M_{\\mathcal{S}}$ and $M_{\\mathcal{T}}$, and uses a direct multi-quantile decoder trained with composite quantile loss $CQL$. The key contributions include: (i) a principled masking scheme that yields interpretable attention weights aligned with domain knowledge, (ii) a spatiotemporal encoder that strengthens spatial causality, and (iii) robustness to distribution shifts demonstrated on the Han River dataset and US lake data. The approach delivers competitive probabilistic forecasts with interpretable variable importances, and improves generalization under covariate shift, offering practical utility for water resource management and disaster risk mitigation.

Abstract

Accurate forecasting of river water levels is vital for effectively managing traffic flow and mitigating the risks associated with natural disasters. This task presents challenges due to the intricate factors influencing the flow of a river. Recent advances in machine learning have introduced numerous effective forecasting methods. However, these methods lack interpretability due to their complex structure, resulting in limited reliability. Addressing this issue, this study proposes a deep learning model that quantifies interpretability, with an emphasis on water level forecasting. This model focuses on generating quantitative interpretability measurements, which align with the common knowledge embedded in the input data. This is facilitated by the utilization of a transformer architecture that is purposefully designed with masking, incorporating a multi-layer network that captures spatiotemporal causation. We perform a comparative analysis on the Han River dataset obtained from Seoul, South Korea, from 2016 to 2021. The results illustrate that our approach offers enhanced interpretability consistent with common knowledge, outperforming competing methods and also enhances robustness against distribution shift.

Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms

TL;DR

This paper tackles interpretable water level forecasting by embedding spatiotemporal causality into a transformer-based model, InstaTran. It introduces SCAN and TAN within a multilayer spatiotemporal causal graph, controlled by masks and , and uses a direct multi-quantile decoder trained with composite quantile loss . The key contributions include: (i) a principled masking scheme that yields interpretable attention weights aligned with domain knowledge, (ii) a spatiotemporal encoder that strengthens spatial causality, and (iii) robustness to distribution shifts demonstrated on the Han River dataset and US lake data. The approach delivers competitive probabilistic forecasts with interpretable variable importances, and improves generalization under covariate shift, offering practical utility for water resource management and disaster risk mitigation.

Abstract

Accurate forecasting of river water levels is vital for effectively managing traffic flow and mitigating the risks associated with natural disasters. This task presents challenges due to the intricate factors influencing the flow of a river. Recent advances in machine learning have introduced numerous effective forecasting methods. However, these methods lack interpretability due to their complex structure, resulting in limited reliability. Addressing this issue, this study proposes a deep learning model that quantifies interpretability, with an emphasis on water level forecasting. This model focuses on generating quantitative interpretability measurements, which align with the common knowledge embedded in the input data. This is facilitated by the utilization of a transformer architecture that is purposefully designed with masking, incorporating a multi-layer network that captures spatiotemporal causation. We perform a comparative analysis on the Han River dataset obtained from Seoul, South Korea, from 2016 to 2021. The results illustrate that our approach offers enhanced interpretability consistent with common knowledge, outperforming competing methods and also enhances robustness against distribution shift.
Paper Structure (30 sections, 29 equations, 12 figures, 9 tables)

This paper contains 30 sections, 29 equations, 12 figures, 9 tables.

Figures (12)

  • Figure 1: The satellite image of Han River, Seoul, South Korea with highlights on the area of interest. The blue points indicate the observatories at which observations are collected, and the red point denotes our target site, the Jamsu bridge $B_0$. The river flows from east to west, emptying into the sea.
  • Figure 2: Left: The multilayer network $\mathcal{G}$, illustrating the spatiotemporal causal relations represented for three consecutive hours: $t-1$, $t$, and $t+1$. Each layer shares the same spatial causal relations, and every node is connected to a corresponding node in the subsequent layer, belonging to the same cluster. Right: Visualization of the spatial masking matrix $M_\mathcal{S}$, with colored blocks representing encodings of connected edges.
  • Figure 3: Architecture of the proposed model (bottom) and details of SCAN and TAN in the spatiotemporal encoder (top).
  • Figure 4: Heatmaps of attention weights $A^{\mathcal{S}}_{{\bf u}'}$, disussed in \ref{['eq: scan2']}. Plots (a) and (b) on the top row correspond to a dry day, and plots (c) and (d) on the bottom row correspond to a rainy day. The variables corresponding to the indices of both the x-axis and y-axis are as follows: $0:P_1,~1:P_2, ~2:P_3,~3:\hbox{WL}~(B_4),~4:\hbox{WL}~(D),~5:\hbox{IF}~(D),~6:\hbox{STR}~(D),~7:\hbox{JUS}~(D),~8:\hbox{OF}~(D),~9:\hbox{WL}~(B_1),~10:\hbox{FL}~(B_1),~11:\hbox{WL}~(B_0),~12:\hbox{WL}~(B_2),~13:\hbox{FL}~(B_2),~14:\hbox{WL}~(B_3),~15:\hbox{FL}~(B_3)$.
  • Figure 5: Variable importances and observations of $P_1$ and OF $(D)$. The solid line denotes the observations of variables. The dashed and dotted lines denote the evaluated variable importances obtained from InstaTran and TFT, respectively.
  • ...and 7 more figures