FOCUS on Contamination: A Geospatial Deep Learning Framework with a Noise-Aware Loss for Surface Water PFAS Prediction
Jowaria Khan, Alexa Friedman, Sydney Evans, Rachel Klein, Runzi Wang, Katherine E. Manz, Kaley Beins, David Q. Andrews, Elizabeth Bondi-Kelly
TL;DR
This work tackles large-scale PFAS contamination mapping by framing PFAS presence as a geospatial segmentation task and introducing FOCUS, a noise-aware, masked autoencoder–based framework that processes multi-channel rasters. By coupling a novel loss that down-weights uncertain labels with domain-informed noise masks and hydrological/land-use context, FOCUS achieves superior accuracy and robustness compared with baselines (including Kriging and SWAT-like simulations) while maintaining computational efficiency. Real-world validation, cross-state generalization, and stakeholder collaboration demonstrate the method's practical potential for scalable PFAS monitoring and targeted remediation. The study also provides a calibrated probabilistic framework suitable for uncertainty-aware sampling and future temporal modeling.
Abstract
Per- and polyfluoroalkyl substances (PFAS), chemicals found in products like non-stick cookware, are unfortunately persistent environmental pollutants with severe health risks. Accurately mapping PFAS contamination is crucial for guiding targeted remediation efforts and protecting public and environmental health, yet detection across large regions remains challenging due to the cost of testing and the difficulty of simulating their spread. In this work, we introduce FOCUS, a geospatial deep learning framework with a label noise-aware loss function, to predict PFAS contamination in surface water over large regions. By integrating hydrological flow data, land cover information, and proximity to known PFAS sources, our approach leverages both spatial and environmental context to improve prediction accuracy. We evaluate the performance of our approach through extensive ablation studies, robustness analysis, real-world validation, and comparative analyses against baselines like sparse segmentation, as well as existing scientific methods, including Kriging and pollutant transport simulations. Results and expert feedback highlight our framework's potential for scalable PFAS monitoring.
