Tackling water table depth modeling via machine learning: From proxy observations to verifiability

Joseph Janssen; Ardalan Tootchi; Ali A. Ameli

Tackling water table depth modeling via machine learning: From proxy observations to verifiability

Joseph Janssen, Ardalan Tootchi, Ali A. Ameli

TL;DR

This paper tackles large-scale, static water table depth (WTD) estimation by combining physically constrained machine learning (ML) with proxy observations to create three 500 m WTD maps for the USA and Canada. It compares three ML setups—V1 (real WTD only), V2 (real plus shoreline proxy WOP>75%), and V3 (adds HAND-derived proxies)—against two PB simulations, evaluating performance across ten North American ecoregions using unseen real and proxy data. Results show ML models generally outperform PB in correlating with observed WTD (Corr-OBS in the range $0.6$–$0.75$) and, in particular, V2 excels at predicting interior wet areas, while V3 captures mountainous variability by leveraging topographic controls like the Topographic Index. The study highlights the pervasive data biases and uncertainties in WTD observations, the risk of model equifinality, and emphasizes future directions toward integrating physical laws, enhancing verification standards, and developing richer proxy data to improve verifiability and realism of large-scale WTD predictions.

Abstract

Spatial patterns of water table depth (WTD) play a crucial role in shaping ecological resilience, hydrological connectivity, and human-centric systems. Generally, a large-scale (e.g., continental or global) continuous map of static WTD can be simulated using either physically-based (PB) or machine learning-based (ML) models. We construct three fine-resolution (500 m) ML simulations of WTD, using the XGBoost algorithm and more than 20 million real and proxy observations of WTD, across the United States and Canada. The three ML models were constrained using known physical relations between WTD's drivers and WTD and were trained by sequentially adding real and proxy observations of WTD. Through an extensive (pixel-by-pixel) evaluation across the study region and within ten major ecoregions of North America, we demonstrate that our models (corr=0.6-0.75) can more accurately predict unseen real and proxy observations of WTD compared to two available PB simulations of WTD (corr=0.21-0.40). However, we still argue that currently-available large-scale simulations of static WTD could be uncertain within data-scarce regions such as steep mountainous regions. We reason that biased observational data mainly collected from low-elevation floodplains and the over-flexibility of available models can negatively affect the verifiability of large-scale simulations of WTD. Ultimately, we thoroughly discuss future directions that may help hydrogeologists decide how to improve machine learning-based WTD estimations. In particular, we advocate for the use of proxy satellite data, the incorporation of physical laws, the implementation of better model verification standards, the development of novel globally-available emergent indices, and the collection of more reliable observations.

Tackling water table depth modeling via machine learning: From proxy observations to verifiability

TL;DR

–

) and, in particular, V2 excels at predicting interior wet areas, while V3 captures mountainous variability by leveraging topographic controls like the Topographic Index. The study highlights the pervasive data biases and uncertainties in WTD observations, the risk of model equifinality, and emphasizes future directions toward integrating physical laws, enhancing verification standards, and developing richer proxy data to improve verifiability and realism of large-scale WTD predictions.

Abstract

Paper Structure (45 sections, 8 figures, 2 tables)

This paper contains 45 sections, 8 figures, 2 tables.

Introduction
Data
Input Variables Used to Develop Machine Learning Models
Climate data
Topography
Geology
Soil data
Land cover
Real Observations of Water Table Depth
Proxy Observations of Water Table Depth
Occurrence of surface water inundation connected to groundwater systems
Height Above Nearest Drainage
Global-scale Physically-based Simulations of WTD
Ecoregions
Methods
...and 30 more sections

Figures (8)

Figure 1: 500 meter resolution pixels showing the locations of the available real observations of WTD (red) and that of the delineated shorelines of surface water bodies (blue), across the Prairie Pothole and Mississippi River Basin regions. Our real observations of WTD and the delineation of the shorelines of surface water bodies extend throughout the entire United States and Canada. But for the sake of simplicity of visualization, we only show these two selected regions with fairly dense observations and delineated shorelines.
Figure 2: The locations of ten ecoregions along which we compare and evaluate the performance of the three machine learning-based and two physically-based WTD simulations. The dashed red arch-shaped zone referred to focus zone and was explained in Section \ref{['sec:trainingvalidation']}
Figure 3: Simulations of Degraaf's (right), Fan's (middle) and our V3 (left) after the alignment process in the lowland area north of Lake Saint Clair. All three simulations identify the location of Lake Saint Clair and it's shorelines (with close to zero WTD) at similar locations. Shorelines are simulated at finer resolution using our V3 model than Fan's and Degraaf's simulation, due to differences in grid resolution.
Figure 4: Five different simulations of water table depths across the US and Canada using machine learning models (a-c) and physically-based models (d-e).
Figure 5: Pixel-by-pixel spatial correlations among the physically-based and machine learning WTD simulations.
...and 3 more figures

Tackling water table depth modeling via machine learning: From proxy observations to verifiability

TL;DR

Abstract

Tackling water table depth modeling via machine learning: From proxy observations to verifiability

Authors

TL;DR

Abstract

Table of Contents

Figures (8)