Table of Contents
Fetching ...

FOCUS on Contamination: A Geospatial Deep Learning Framework with a Noise-Aware Loss for Surface Water PFAS Prediction

Jowaria Khan, Alexa Friedman, Sydney Evans, Rachel Klein, Runzi Wang, Katherine E. Manz, Kaley Beins, David Q. Andrews, Elizabeth Bondi-Kelly

TL;DR

This work tackles large-scale PFAS contamination mapping by framing PFAS presence as a geospatial segmentation task and introducing FOCUS, a noise-aware, masked autoencoder–based framework that processes multi-channel rasters. By coupling a novel loss that down-weights uncertain labels with domain-informed noise masks and hydrological/land-use context, FOCUS achieves superior accuracy and robustness compared with baselines (including Kriging and SWAT-like simulations) while maintaining computational efficiency. Real-world validation, cross-state generalization, and stakeholder collaboration demonstrate the method's practical potential for scalable PFAS monitoring and targeted remediation. The study also provides a calibrated probabilistic framework suitable for uncertainty-aware sampling and future temporal modeling.

Abstract

Per- and polyfluoroalkyl substances (PFAS), chemicals found in products like non-stick cookware, are unfortunately persistent environmental pollutants with severe health risks. Accurately mapping PFAS contamination is crucial for guiding targeted remediation efforts and protecting public and environmental health, yet detection across large regions remains challenging due to the cost of testing and the difficulty of simulating their spread. In this work, we introduce FOCUS, a geospatial deep learning framework with a label noise-aware loss function, to predict PFAS contamination in surface water over large regions. By integrating hydrological flow data, land cover information, and proximity to known PFAS sources, our approach leverages both spatial and environmental context to improve prediction accuracy. We evaluate the performance of our approach through extensive ablation studies, robustness analysis, real-world validation, and comparative analyses against baselines like sparse segmentation, as well as existing scientific methods, including Kriging and pollutant transport simulations. Results and expert feedback highlight our framework's potential for scalable PFAS monitoring.

FOCUS on Contamination: A Geospatial Deep Learning Framework with a Noise-Aware Loss for Surface Water PFAS Prediction

TL;DR

This work tackles large-scale PFAS contamination mapping by framing PFAS presence as a geospatial segmentation task and introducing FOCUS, a noise-aware, masked autoencoder–based framework that processes multi-channel rasters. By coupling a novel loss that down-weights uncertain labels with domain-informed noise masks and hydrological/land-use context, FOCUS achieves superior accuracy and robustness compared with baselines (including Kriging and SWAT-like simulations) while maintaining computational efficiency. Real-world validation, cross-state generalization, and stakeholder collaboration demonstrate the method's practical potential for scalable PFAS monitoring and targeted remediation. The study also provides a calibrated probabilistic framework suitable for uncertainty-aware sampling and future temporal modeling.

Abstract

Per- and polyfluoroalkyl substances (PFAS), chemicals found in products like non-stick cookware, are unfortunately persistent environmental pollutants with severe health risks. Accurately mapping PFAS contamination is crucial for guiding targeted remediation efforts and protecting public and environmental health, yet detection across large regions remains challenging due to the cost of testing and the difficulty of simulating their spread. In this work, we introduce FOCUS, a geospatial deep learning framework with a label noise-aware loss function, to predict PFAS contamination in surface water over large regions. By integrating hydrological flow data, land cover information, and proximity to known PFAS sources, our approach leverages both spatial and environmental context to improve prediction accuracy. We evaluate the performance of our approach through extensive ablation studies, robustness analysis, real-world validation, and comparative analyses against baselines like sparse segmentation, as well as existing scientific methods, including Kriging and pollutant transport simulations. Results and expert feedback highlight our framework's potential for scalable PFAS monitoring.

Paper Structure

This paper contains 29 sections, 3 equations, 9 figures, 16 tables.

Figures (9)

  • Figure 1: Illustration of limited PFAS contamination data USEPA2015
  • Figure 2: Left: Classical ML techniques aggregate surrounding pixel data to predict contamination at specific points (depicted as x, y coordinates); Right: DL methods process raster images directly to generate dense PFAS contamination maps in a single pass.
  • Figure 3: Overview of dataset curation pipeline
  • Figure 4: Example raster channels: (left) land cover, (center) distances from chemical manufacturing industries, and (right) flow direction.
  • Figure 5: Model overview: high-level representation of FOCUS
  • ...and 4 more figures