Table of Contents
Fetching ...

Scraping the Shadows: Deep Learning Breakthroughs in Dark Web Intelligence

Ingmar Bakermans, Daniel De Pascale, Gonçalo Marcelino, Giuseppe Cascavilla, Zeno Geradts

TL;DR

This work tackles automated extraction from Darknet Markets by evaluating three state-of-the-art NER models (ELMo-BiLSTM-CNN, UniversalNER, GLiNER) on a newly annotated DNM dataset. It presents a multi-stage pipeline: data gathering from six DNMs (including French-language Cocorico), careful data preparation with RegEx-based labeling, and three modeling approaches including zero-shot and fine-tuned configurations. The key finding is that a fine-tuned UniversalNER-7B model achieves the best performance (Precision ≈ 91%, Recall ≈ 96%, F1 ≈ 94–95%), while zero-shot GLiNER variants show stronger generalization but lower absolute scores; robustness tests on Palmetto indicate high precision but limited recall in unseen contexts. Collectively, the study demonstrates the feasibility of automated DNM information extraction for LEAs, offering a scalable path to OSINT from the dark web, albeit with substantial computational requirements and a need for more diverse data to improve recall and robustness.

Abstract

Darknet markets (DNMs) facilitate the trade of illegal goods on a global scale. Gathering data on DNMs is critical to ensuring law enforcement agencies can effectively combat crime. Manually extracting data from DNMs is an error-prone and time-consuming task. Aiming to automate this process we develop a framework for extracting data from DNMs and evaluate the application of three state-of-the-art Named Entity Recognition (NER) models, ELMo-BiLSTM \citep{ShahEtAl2022}, UniversalNER \citep{ZhouEtAl2024}, and GLiNER \citep{ZaratianaEtAl2023}, at the task of extracting complex entities from DNM product listing pages. We propose a new annotated dataset, which we use to train, fine-tune, and evaluate the models. Our findings show that state-of-the-art NER models perform well in information extraction from DNMs, achieving 91% Precision, 96% Recall, and an F1 score of 94%. In addition, fine-tuning enhances model performance, with UniversalNER achieving the best performance.

Scraping the Shadows: Deep Learning Breakthroughs in Dark Web Intelligence

TL;DR

This work tackles automated extraction from Darknet Markets by evaluating three state-of-the-art NER models (ELMo-BiLSTM-CNN, UniversalNER, GLiNER) on a newly annotated DNM dataset. It presents a multi-stage pipeline: data gathering from six DNMs (including French-language Cocorico), careful data preparation with RegEx-based labeling, and three modeling approaches including zero-shot and fine-tuned configurations. The key finding is that a fine-tuned UniversalNER-7B model achieves the best performance (Precision ≈ 91%, Recall ≈ 96%, F1 ≈ 94–95%), while zero-shot GLiNER variants show stronger generalization but lower absolute scores; robustness tests on Palmetto indicate high precision but limited recall in unseen contexts. Collectively, the study demonstrates the feasibility of automated DNM information extraction for LEAs, offering a scalable path to OSINT from the dark web, albeit with substantial computational requirements and a need for more diverse data to improve recall and robustness.

Abstract

Darknet markets (DNMs) facilitate the trade of illegal goods on a global scale. Gathering data on DNMs is critical to ensuring law enforcement agencies can effectively combat crime. Manually extracting data from DNMs is an error-prone and time-consuming task. Aiming to automate this process we develop a framework for extracting data from DNMs and evaluate the application of three state-of-the-art Named Entity Recognition (NER) models, ELMo-BiLSTM \citep{ShahEtAl2022}, UniversalNER \citep{ZhouEtAl2024}, and GLiNER \citep{ZaratianaEtAl2023}, at the task of extracting complex entities from DNM product listing pages. We propose a new annotated dataset, which we use to train, fine-tune, and evaluate the models. Our findings show that state-of-the-art NER models perform well in information extraction from DNMs, achieving 91% Precision, 96% Recall, and an F1 score of 94%. In addition, fine-tuning enhances model performance, with UniversalNER achieving the best performance.

Paper Structure

This paper contains 26 sections, 18 figures, 7 tables.

Figures (18)

  • Figure 1: Crawler infrastructure introduced by Shah et al. ShahEtAl2022.
  • Figure 2: ELMo-BiLSTM-CNN model pipeline.
  • Figure 3: The conversation-style template. The conversation is used to tune language models. Only the highlighted parts are used to compute the loss ZhouEtAl2024.
  • Figure 4: Top 10 Vendor-DNM combinations.
  • Figure 5: Top 10 Models sold across all analyzed DNMs.
  • ...and 13 more figures