Identifying Trustworthiness Challenges in Deep Learning Models for Continental-Scale Water Quality Prediction

Xiaobo Xia; Xiaofeng Liu; Jiale Liu; Kuai Fang; Lu Lu; Samet Oymak; William S. Currie; Tongliang Liu

Identifying Trustworthiness Challenges in Deep Learning Models for Continental-Scale Water Quality Prediction

Xiaobo Xia, Xiaofeng Liu, Jiale Liu, Kuai Fang, Lu Lu, Samet Oymak, William S. Currie, Tongliang Liu

TL;DR

This work presents a multi-dimensional, quantitative evaluation of trustworthiness benchmarking three state-of-the-art deep learning architectures: recurrent (LSTM), operator-learning (DeepONet), and transformer-based (Informer), trained on 37 years of data from 482 U.S. basins to predict 20 water quality variables.

Abstract

Water quality is foundational to environmental sustainability, ecosystem resilience, and public health. Deep learning offers transformative potential for large-scale water quality prediction and scientific insights generation. However, their widespread adoption in high-stakes operational decision-making, such as pollution mitigation and equitable resource allocation, is prevented by unresolved trustworthiness challenges, including performance disparity, robustness, uncertainty, interpretability, generalizability, and reproducibility. In this work, we present a multi-dimensional, quantitative evaluation of trustworthiness benchmarking three state-of-the-art deep learning architectures: recurrent (LSTM), operator-learning (DeepONet), and transformer-based (Informer), trained on 37 years of data from 482 U.S. basins to predict 20 water quality variables. Our investigation reveals systematic performance disparities tied to process complexity, data availability, and basin heterogeneity. Management-critical variables remain the least predictable and most uncertain. Robustness tests reveal pronounced sensitivity to outliers and corrupted targets; notably, the architecture with the strongest baseline performance (LSTM) proves most vulnerable under data corruption. Attribution analyses align for simple variables but diverge for nutrients, underscoring the need for multi-method interpretability. Spatial generalization to ungauged basins remains poor across all models. This work serves as a timely call to action for advancing trustworthy data-driven methods for water resources management and provides a pathway to offering critical insights for researchers, decision-makers, and practitioners seeking to leverage artificial intelligence (AI) responsibly in environmental management.

Identifying Trustworthiness Challenges in Deep Learning Models for Continental-Scale Water Quality Prediction

TL;DR

Abstract

Identifying Trustworthiness Challenges in Deep Learning Models for Continental-Scale Water Quality Prediction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (25)