Table of Contents
Fetching ...

HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection

Qing Wen, Haohao Li, Zhongjie Ba, Peng Cheng, Miao He, Li Lu, Kui Ren

TL;DR

HyperPotter tackles audio deepfake detection by shifting from traditional pairwise feature interactions to high-order interactions (HOIs) modeled with a prototype-guided hypergraph. Grounded in information-theoretic notions via O-information, it leverages a memory-enhanced Hypergraph Attention Network (HAGNN) and a prototype bank to initialize and refine multi-way relationships, while a relational artifact amplification module emphasizes informative synergistic cues. Across 13 datasets, HyperPotter delivers superior cross-domain generalization with notable gains, and ablations confirm the necessity of HOI modeling and the prototype memory. The approach offers a practical pathway to more robust ADD systems capable of handling diverse attacks and speakers, with a modest parameter footprint and scalable prototype memory.

Abstract

Advances in AIGC technologies have enabled the synthesis of highly realistic audio deepfakes capable of deceiving human auditory perception. Although numerous audio deepfake detection (ADD) methods have been developed, most rely on local temporal/spectral features or pairwise relations, overlooking high-order interactions (HOIs). HOIs capture discriminative patterns that emerge from multiple feature components beyond their individual contributions. We propose HyperPotter, a hypergraph-based framework that explicitly models these synergistic HOIs through clustering-based hyperedges with class-aware prototype initialization. Extensive experiments demonstrate that HyperPotter surpasses its baseline by an average relative gain of 22.15% across 11 datasets and outperforms state-of-the-art methods by 13.96% on 4 challenging cross-domain datasets, demonstrating superior generalization to diverse attacks and speakers.

HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection

TL;DR

HyperPotter tackles audio deepfake detection by shifting from traditional pairwise feature interactions to high-order interactions (HOIs) modeled with a prototype-guided hypergraph. Grounded in information-theoretic notions via O-information, it leverages a memory-enhanced Hypergraph Attention Network (HAGNN) and a prototype bank to initialize and refine multi-way relationships, while a relational artifact amplification module emphasizes informative synergistic cues. Across 13 datasets, HyperPotter delivers superior cross-domain generalization with notable gains, and ablations confirm the necessity of HOI modeling and the prototype memory. The approach offers a practical pathway to more robust ADD systems capable of handling diverse attacks and speakers, with a modest parameter footprint and scalable prototype memory.

Abstract

Advances in AIGC technologies have enabled the synthesis of highly realistic audio deepfakes capable of deceiving human auditory perception. Although numerous audio deepfake detection (ADD) methods have been developed, most rely on local temporal/spectral features or pairwise relations, overlooking high-order interactions (HOIs). HOIs capture discriminative patterns that emerge from multiple feature components beyond their individual contributions. We propose HyperPotter, a hypergraph-based framework that explicitly models these synergistic HOIs through clustering-based hyperedges with class-aware prototype initialization. Extensive experiments demonstrate that HyperPotter surpasses its baseline by an average relative gain of 22.15% across 11 datasets and outperforms state-of-the-art methods by 13.96% on 4 challenging cross-domain datasets, demonstrating superior generalization to diverse attacks and speakers.
Paper Structure (53 sections, 9 equations, 8 figures, 7 tables)

This paper contains 53 sections, 9 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Motivated by O-information analysis, the HyperPotter framework enables high-order relation modeling and achieves competitive generalization performance across diverse scenarios.
  • Figure 2: Overview of the HyperPotter framework. HyperPotter integrates hypergraph attention layers with prototype-guided hyperedge initialization to capture and amplify high-order relational artifacts, enabling effective aggregation of discriminative cues for generalizable audio spoofing detection.
  • Figure 3: Visualizations demonstrating the effectiveness of HOI modeling and its capability to describe synthetic artifacts.
  • Figure 5: Clustermaps for different samples in PartialSpoof evaluation set
  • Figure 6: Classification results based on the distribution of centroids for different samples in PartialSpoof evaluation set
  • ...and 3 more figures