PHISH in MESH: Korean Adversarial Phonetic Substitution and Phonetic-Semantic Feature Integration Defense
Byungjun Kim, Minju Kim, Hyeonchu Park, Bugeun Kim
TL;DR
This work addresses the vulnerability of hate-speech detectors to phonetic substitutions in Korean, a language often neglected in adversarial research. It introduces PHISH, a Korean-specific phonetic substitution attack that leverages Hangul syllable structure and a predefined phonetic look-up table, and MESH, a phoneme-augmented framework implemented as seq-MESH and dir-MESH to enhance robustness via cross-attention. Empirical results on K-HATERS and KoLD show that PHISH degrades baseline detectors, while MESH variants consistently improve resilience to phonetic perturbations and even boost performance on unperturbed data, suggesting real-world applicability. The findings highlight the value of language-aware perturbation strategies and architectural defenses that integrate phonetic information for practical, scalable hate-speech detection.
Abstract
As malicious users increasingly employ phonetic substitution to evade hate speech detection, researchers have investigated such strategies. However, two key challenges remain. First, existing studies have overlooked the Korean language, despite its vulnerability to phonetic perturbations due to its phonographic nature. Second, prior work has primarily focused on constructing datasets rather than developing architectural defenses. To address these challenges, we propose (1) PHonetic-Informed Substitution for Hangul (PHISH) that exploits the phonological characteristics of the Korean writing system, and (2) Mixed Encoding of Semantic-pHonetic features (MESH) that enhances the detector's robustness by incorporating phonetic information at the architectural level. Our experimental results demonstrate the effectiveness of our proposed methods on both perturbed and unperturbed datasets, suggesting that they not only improve detection performance but also reflect realistic adversarial behaviors employed by malicious users.
