CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment
Jiangnan Li, Thuy-Trang Vu, Christian Herold, Amirhossein Tebbifakhr, Shahram Khadivi, Gholamreza Haffari
TL;DR
The paper tackles negative interference in multilingual preference alignment by introducing CONGRAD, a gradient-filtering approach that derives a consensus update across languages using EMA gradients and PCGrad-style de-conflicting projections. It then filters self-generated multilingual preference data by alignment to this consensus and uses memory-efficient, low-rank gradient EMA updates to scale to large models. Integrated into a self-rewarding Direct Preference Optimization loop with length penalties (LP-DPO), CONGRAD is evaluated on Llama3-8B and Gemma2-2B across 10 languages, showing consistent gains in instruction following for seen, unseen, and underrepresented languages, with negligible or positive alignment tax. The approach demonstrates that high-quality, gradient-aligned samples drive robust multilingual alignment and scalable improvement without external human-annotated data. Overall, CONGRAD advances practical multilingual alignment by mitigating cross-language conflicts and enabling effective transfer across language families with minimal overhead.
Abstract
Naive joint training of large language models (LLMs) for multilingual preference alignment can suffer from negative interference. This is a known issue in multilingual training, where conflicting objectives degrade overall performance. However, the impact of this phenomenon in the context of multilingual preference alignment remains largely underexplored. To address this issue, we propose CONGRAD, a scalable and effective filtering method that selects high-quality preference samples with minimal gradient conflicts across languages. Our method leverages gradient surgery to retain samples aligned with an aggregated multilingual update direction. Additionally, we incorporate a sublinear gradient compression strategy that reduces memory overhead during gradient accumulation. We integrate CONGRAD into self-rewarding framework and evaluate on LLaMA3-8B and Gemma2-2B across 10 languages. Results show that CONGRAD consistently outperforms strong baselines in both seen and unseen languages, with minimal alignment tax.
