Table of Contents
Fetching ...

CRACKS: Crowdsourcing Resources for Analysis and Categorization of Key Subsurface faults

Mohit Prabhushankar, Kiran Kokilepersaud, Jorge Quesada, Yavuz Yarici, Chen Zhou, Mohammad Alotaibi, Ghassan AlRegib, Ahmad Mustafa, Yusufjon Kumakov

TL;DR

CRACKS tackles the lack of expert-labeled subsurface fault annotations by crowdsourcing fault delineations from novices, practitioners, and a geophysicist on a 3D F3 seismic volume. It demonstrates that noisy crowd labels, when fused and used to fine-tune expert data, can improve fault detection and instance segmentation (e.g., mean Average Precision at IoU threshold $0.5$) and enable self-supervised fault delineation on crops of fault regions. The dataset provides three annotation confidence types and QA metrics to study inter- and intra-annotator agreement, offering benchmarks for detection, segmentation, and SSL under realistic noise. By releasing open data and code, CRACKS provides a practical benchmark for learning from noisy domain-specific labels in seismic imaging and motivates extensions to 3D labeling and multi-modal fault analysis.

Abstract

Crowdsourcing annotations has created a paradigm shift in the availability of labeled data for machine learning. Availability of large datasets has accelerated progress in common knowledge applications involving visual and language data. However, specialized applications that require expert labels lag in data availability. One such application is fault segmentation in subsurface imaging. Detecting, tracking, and analyzing faults has broad societal implications in predicting fluid flows, earthquakes, and storing excess atmospheric CO$_2$. However, delineating faults with current practices is a labor-intensive activity that requires precise analysis of subsurface imaging data by geophysicists. In this paper, we propose the $\texttt{CRACKS}$ dataset to detect and segment faults in subsurface images by utilizing crowdsourced resources. We leverage Amazon Mechanical Turk to obtain fault delineations from sections of the Netherlands North Sea subsurface images from (i) $26$ novices who have no exposure to subsurface data and were shown a video describing and labeling faults, (ii) $8$ practitioners who have previously interacted and worked on subsurface data, (iii) one geophysicist to label $7636$ faults in the region. Note that all novices, practitioners, and the expert segment faults on the same subsurface volume with disagreements between and among the novices and practitioners. Additionally, each fault annotation is equipped with the confidence level of the annotator. The paper provides benchmarks on detecting and segmenting the expert labels, given the novice and practitioner labels. Additional details along with the dataset links and codes are available at $\href{https://alregib.ece.gatech.edu/cracks-crowdsourcing-resources-for-analysis-and-categorization-of-key-subsurface-faults/}{link}$.

CRACKS: Crowdsourcing Resources for Analysis and Categorization of Key Subsurface faults

TL;DR

CRACKS tackles the lack of expert-labeled subsurface fault annotations by crowdsourcing fault delineations from novices, practitioners, and a geophysicist on a 3D F3 seismic volume. It demonstrates that noisy crowd labels, when fused and used to fine-tune expert data, can improve fault detection and instance segmentation (e.g., mean Average Precision at IoU threshold ) and enable self-supervised fault delineation on crops of fault regions. The dataset provides three annotation confidence types and QA metrics to study inter- and intra-annotator agreement, offering benchmarks for detection, segmentation, and SSL under realistic noise. By releasing open data and code, CRACKS provides a practical benchmark for learning from noisy domain-specific labels in seismic imaging and motivates extensions to 3D labeling and multi-modal fault analysis.

Abstract

Crowdsourcing annotations has created a paradigm shift in the availability of labeled data for machine learning. Availability of large datasets has accelerated progress in common knowledge applications involving visual and language data. However, specialized applications that require expert labels lag in data availability. One such application is fault segmentation in subsurface imaging. Detecting, tracking, and analyzing faults has broad societal implications in predicting fluid flows, earthquakes, and storing excess atmospheric CO. However, delineating faults with current practices is a labor-intensive activity that requires precise analysis of subsurface imaging data by geophysicists. In this paper, we propose the dataset to detect and segment faults in subsurface images by utilizing crowdsourced resources. We leverage Amazon Mechanical Turk to obtain fault delineations from sections of the Netherlands North Sea subsurface images from (i) novices who have no exposure to subsurface data and were shown a video describing and labeling faults, (ii) practitioners who have previously interacted and worked on subsurface data, (iii) one geophysicist to label faults in the region. Note that all novices, practitioners, and the expert segment faults on the same subsurface volume with disagreements between and among the novices and practitioners. Additionally, each fault annotation is equipped with the confidence level of the annotator. The paper provides benchmarks on detecting and segmenting the expert labels, given the novice and practitioner labels. Additional details along with the dataset links and codes are available at .
Paper Structure (48 sections, 3 equations, 16 figures, 7 tables)

This paper contains 48 sections, 3 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Expert interpretation of a seismic fault.
  • Figure 2: Expert/Practitioner/Novice annotation of seismic sections.
  • Figure 3: Consistency metrics (section-wise) across $120$ repeated sections.
  • Figure 4: Sanity check to demonstrate that the novice and practitioner annotations are not trivial. Pairwise mIOU between annotations from every practitioner and novice (x-axis) is calculated against the expert annotation and is presented on the y-axis.
  • Figure 5: SSL finetuning on each annotator subset.
  • ...and 11 more figures