GemDetox at TextDetox CLEF 2025: Enhancing a Massively Multilingual Model for Text Detoxification on Low-resource Languages
Trung Duc Anh Dang, Ferdinando Pio D'Elia
TL;DR
GemDetox at TextDetox CLEF 2025 advances multilingual detoxification by fine-tuning a 12B Gemma-3 transformer with LoRA adapters and Chain-of-Thought prompting to rewrite toxic single-sentence inputs into neutral paraphrases across 15 languages. The approach combines ParaDetox parallel data, MT6 augmentations, and synthetic data with rigorous filtering, enriched with LaBSE neighbors and explicit toxic spans during inference, and selects outputs via a joint quality score that balances non-toxicity, semantic preservation, and fluency. Results show top performance across languages, with notable gains for low-resource languages through data augmentation and prompting strategies; LLM-as-a-Judge analysis provides nuanced insights into evaluation discrepancies and the impact of resource status. Limitations include reliance on the base model, cross-linguistic prompt consistency, and potential biases from synthetic data, while future work points to preference-based optimization, dynamic lexicons, and multi-stage reasoning pipelines to improve robustness and fairness in detoxification systems.
Abstract
As social-media platforms emerge and evolve faster than the regulations meant to oversee them, automated detoxification might serve as a timely tool for moderators to enforce safe discourse at scale. We here describe our submission to the PAN 2025 Multilingual Text Detoxification Challenge, which rewrites toxic single-sentence inputs into neutral paraphrases across 15 typologically diverse languages. Building on a 12B-parameter Gemma-3 multilingual transformer, we apply parameter-efficient LoRA SFT fine-tuning and prompting techniques like few-shot and Chain-of-Thought. Our multilingual training corpus combines 3,600 human-authored parallel pairs, 21,600 machine-translated synthetic pairs, and model-generated pairs filtered by Jaccard thresholds. At inference, inputs are enriched with three LaBSE-retrieved neighbors and explicit toxic-span annotations. Evaluated via Style Transfer Accuracy, LaBSE-based semantic preservation, and xCOMET fluency, our system ranks first on high-resource and low-resource languages. Ablations show +0.081 joint score increase from few-shot examples and +0.088 from basic CoT prompting. ANOVA analysis identifies language resource status as the strongest predictor of performance ($η^2$ = 0.667, p < 0.01).
