Table of Contents
Fetching ...

PHISH in MESH: Korean Adversarial Phonetic Substitution and Phonetic-Semantic Feature Integration Defense

Byungjun Kim, Minju Kim, Hyeonchu Park, Bugeun Kim

TL;DR

This work addresses the vulnerability of hate-speech detectors to phonetic substitutions in Korean, a language often neglected in adversarial research. It introduces PHISH, a Korean-specific phonetic substitution attack that leverages Hangul syllable structure and a predefined phonetic look-up table, and MESH, a phoneme-augmented framework implemented as seq-MESH and dir-MESH to enhance robustness via cross-attention. Empirical results on K-HATERS and KoLD show that PHISH degrades baseline detectors, while MESH variants consistently improve resilience to phonetic perturbations and even boost performance on unperturbed data, suggesting real-world applicability. The findings highlight the value of language-aware perturbation strategies and architectural defenses that integrate phonetic information for practical, scalable hate-speech detection.

Abstract

As malicious users increasingly employ phonetic substitution to evade hate speech detection, researchers have investigated such strategies. However, two key challenges remain. First, existing studies have overlooked the Korean language, despite its vulnerability to phonetic perturbations due to its phonographic nature. Second, prior work has primarily focused on constructing datasets rather than developing architectural defenses. To address these challenges, we propose (1) PHonetic-Informed Substitution for Hangul (PHISH) that exploits the phonological characteristics of the Korean writing system, and (2) Mixed Encoding of Semantic-pHonetic features (MESH) that enhances the detector's robustness by incorporating phonetic information at the architectural level. Our experimental results demonstrate the effectiveness of our proposed methods on both perturbed and unperturbed datasets, suggesting that they not only improve detection performance but also reflect realistic adversarial behaviors employed by malicious users.

PHISH in MESH: Korean Adversarial Phonetic Substitution and Phonetic-Semantic Feature Integration Defense

TL;DR

This work addresses the vulnerability of hate-speech detectors to phonetic substitutions in Korean, a language often neglected in adversarial research. It introduces PHISH, a Korean-specific phonetic substitution attack that leverages Hangul syllable structure and a predefined phonetic look-up table, and MESH, a phoneme-augmented framework implemented as seq-MESH and dir-MESH to enhance robustness via cross-attention. Empirical results on K-HATERS and KoLD show that PHISH degrades baseline detectors, while MESH variants consistently improve resilience to phonetic perturbations and even boost performance on unperturbed data, suggesting real-world applicability. The findings highlight the value of language-aware perturbation strategies and architectural defenses that integrate phonetic information for practical, scalable hate-speech detection.

Abstract

As malicious users increasingly employ phonetic substitution to evade hate speech detection, researchers have investigated such strategies. However, two key challenges remain. First, existing studies have overlooked the Korean language, despite its vulnerability to phonetic perturbations due to its phonographic nature. Second, prior work has primarily focused on constructing datasets rather than developing architectural defenses. To address these challenges, we propose (1) PHonetic-Informed Substitution for Hangul (PHISH) that exploits the phonological characteristics of the Korean writing system, and (2) Mixed Encoding of Semantic-pHonetic features (MESH) that enhances the detector's robustness by incorporating phonetic information at the architectural level. Our experimental results demonstrate the effectiveness of our proposed methods on both perturbed and unperturbed datasets, suggesting that they not only improve detection performance but also reflect realistic adversarial behaviors employed by malicious users.

Paper Structure

This paper contains 21 sections, 1 figure, 7 tables, 3 algorithms.

Figures (1)

  • Figure 1: Architectures of base models and our methods. (a) shows the architecture of base detectors using the self-attention mechanism; (b) shows the architecture of seq-MESH detectors using stacked self and cross-attention layers; (c) shows the architecture of dir-MESH detectors using cross-attention instead of self-attention.