Time Series Predictions in Unmonitored Sites: A Survey of Machine Learning Techniques in Water Resources
Jared D. Willard, Charuleka Varadharajan, Xiaowei Jia, Vipin Kumar
TL;DR
The paper surveys machine learning approaches for predicting hydrologic time series at unmonitored sites, where data are sparse or absent. It categorizes frameworks into broad-scale entity-aware models, transfer learning, and knowledge-guided ML, detailing methods such as direct concatenation, encoded site characteristics, graph neural networks, meta-transfer learning, and unsupervised domain adaptation. It highlights knowledge-guided techniques, including physics-informed losses, differentiable process-based models, and hybrid residual modeling, as promising avenues to leverage existing process understanding. The review emphasizes open questions on data requirements, feature selection, dynamic site characteristics, uncertainty quantification, and explainable AI, and calls for cross-disciplinary collaboration to translate methodological advances into practical water resources applications.
Abstract
Prediction of dynamic environmental variables in unmonitored sites remains a long-standing challenge for water resources science. The majority of the world's freshwater resources have inadequate monitoring of critical environmental variables needed for management. Yet, the need to have widespread predictions of hydrological variables such as river flow and water quality has become increasingly urgent due to climate and land use change over the past decades, and their associated impacts on water resources. Modern machine learning methods increasingly outperform their process-based and empirical model counterparts for hydrologic time series prediction with their ability to extract information from large, diverse data sets. We review relevant state-of-the art applications of machine learning for streamflow, water quality, and other water resources prediction and discuss opportunities to improve the use of machine learning with emerging methods for incorporating watershed characteristics into deep learning models, transfer learning, and incorporating process knowledge into machine learning models. The analysis here suggests most prior efforts have been focused on deep learning learning frameworks built on many sites for predictions at daily time scales in the United States, but that comparisons between different classes of machine learning methods are few and inadequate. We identify several open questions for time series predictions in unmonitored sites that include incorporating dynamic inputs and site characteristics, mechanistic understanding and spatial context, and explainable AI techniques in modern machine learning frameworks.
