Table of Contents
Fetching ...

LLM-Empowered Class Imbalanced Graph Prompt Learning for Online Drug Trafficking Detection

Tianyi Ma, Yiyue Qian, Zehong Wang, Zheyuan Zhang, Chuxu Zhang, Yanfang Ye

TL;DR

The paper tackles illicit online drug trafficking detection under extreme class imbalance and limited labels by proposing LLM-HetGDT, a framework that fuses heterogeneous graph neural networks with Large Language Models. It pre-trains HGNNs using a cross-view contrastive loss $L_{cl}$, augments the heterogeneous graph with LLM-generated synthetic minority nodes and edges, and employs multi-type prompts (node, structure, drug-trafficking) whose tuning optimizes $L_{pt}=L_{ce}+\lambda L_o$ with $L_o=\|{\mathbf{C}}{\mathbf{C}}^\top-{\mathbf{I}}\|_F^2$ and $Z'=Z+\delta S$, to improve minority-class detection. A new Twitter-based dataset, Twitter-HetDrug, is built to study online drug trafficking under class imbalance, and extensive experiments show that LLM-HetGDT outperforms state-of-the-art baselines in both accuracy and speed, including deployment scenarios. The work provides a practical, scalable approach that leverages both textual content and relational structure to detect drug-related activity in real-time, with broad implications for security and public health.

Abstract

As the market for illicit drugs remains extremely profitable, major online platforms have become direct-to-consumer intermediaries for illicit drug trafficking participants. These online activities raise significant social concerns that require immediate actions. Existing approaches to combating this challenge are generally impractical, due to the imbalance of classes and scarcity of labeled samples in real-world applications. To this end, we propose a novel Large Language Model-empowered Heterogeneous Graph Prompt Learning framework for illicit Drug Trafficking detection, called LLM-HetGDT, that leverages LLM to facilitate heterogeneous graph neural networks (HGNNs) to effectively identify drug trafficking activities in the class-imbalanced scenarios. Specifically, we first pre-train HGNN over a contrastive pretext task to capture the inherent node and structure information over the unlabeled drug trafficking heterogeneous graph (HG). Afterward, we employ LLM to augment the HG by generating high-quality synthetic user nodes in minority classes. Then, we fine-tune the soft prompts on the augmented HG to capture the important information in the minority classes for the downstream drug trafficking detection task. To comprehensively study online illicit drug trafficking activities, we collect a new HG dataset over Twitter, called Twitter-HetDrug. Extensive experiments on this dataset demonstrate the effectiveness, efficiency, and applicability of LLM-HetGDT.

LLM-Empowered Class Imbalanced Graph Prompt Learning for Online Drug Trafficking Detection

TL;DR

The paper tackles illicit online drug trafficking detection under extreme class imbalance and limited labels by proposing LLM-HetGDT, a framework that fuses heterogeneous graph neural networks with Large Language Models. It pre-trains HGNNs using a cross-view contrastive loss , augments the heterogeneous graph with LLM-generated synthetic minority nodes and edges, and employs multi-type prompts (node, structure, drug-trafficking) whose tuning optimizes with and , to improve minority-class detection. A new Twitter-based dataset, Twitter-HetDrug, is built to study online drug trafficking under class imbalance, and extensive experiments show that LLM-HetGDT outperforms state-of-the-art baselines in both accuracy and speed, including deployment scenarios. The work provides a practical, scalable approach that leverages both textual content and relational structure to detect drug-related activity in real-time, with broad implications for security and public health.

Abstract

As the market for illicit drugs remains extremely profitable, major online platforms have become direct-to-consumer intermediaries for illicit drug trafficking participants. These online activities raise significant social concerns that require immediate actions. Existing approaches to combating this challenge are generally impractical, due to the imbalance of classes and scarcity of labeled samples in real-world applications. To this end, we propose a novel Large Language Model-empowered Heterogeneous Graph Prompt Learning framework for illicit Drug Trafficking detection, called LLM-HetGDT, that leverages LLM to facilitate heterogeneous graph neural networks (HGNNs) to effectively identify drug trafficking activities in the class-imbalanced scenarios. Specifically, we first pre-train HGNN over a contrastive pretext task to capture the inherent node and structure information over the unlabeled drug trafficking heterogeneous graph (HG). Afterward, we employ LLM to augment the HG by generating high-quality synthetic user nodes in minority classes. Then, we fine-tune the soft prompts on the augmented HG to capture the important information in the minority classes for the downstream drug trafficking detection task. To comprehensively study online illicit drug trafficking activities, we collect a new HG dataset over Twitter, called Twitter-HetDrug. Extensive experiments on this dataset demonstrate the effectiveness, efficiency, and applicability of LLM-HetGDT.

Paper Structure

This paper contains 38 sections, 15 equations, 3 figures, 14 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of drug trafficking activities among users on Twitter.
  • Figure 2: The overall framework of LLM-HetGDT: (a) It pre-trains HGNN with the contrastive pretext task. (b) It leverages LLM to generate synthetic users and connections between synthetic users and neighbors of the original users, forming an augmented HG ${\mathcal{G}}'$; (c) LLM-HetGDT injects node prompt to node attribute features and feeds the augmented HG ${\mathcal{G}}'$ into the pre-trained HGNN to obtain the target node embeddings. Afterward, it augments the target node embeddings with structure prompt and further computes the similarity between node embeddings and class prompt to obtain the classification loss for optimization.
  • Figure 3: Performances of model variants over Twitter-HetDrug with training splits 10%, 20%, and 40%.