Teaching LLMs to Abstain across Languages via Multilingual Feedback
Shangbin Feng, Weijia Shi, Yike Wang, Wenxuan Ding, Orevaoghene Ahia, Shuyue Stella Li, Vidhisha Balachandran, Sunayana Sitaram, Yulia Tsvetkov
TL;DR
This work extends the abstention paradigm to multilingual LLMs by teaching models to abstain through multilingual feedback generated in related languages. The approach addresses the degradation of abstention performance in low-resource languages seen with English-centric methods, and demonstrates gains up to 9.2% in abstain accuracy on low-resource languages across multiple models and QA datasets. A key insight is that language relatedness and cultural context influence feedback quality and abstention behavior, making abstention a language-specific rather than universal problem. The authors show that using related languages for feedback yields more equitable utility and reveals that a smaller multilingual model can supervise a larger general-purpose LLM to improve reliability in long-tail languages. These findings highlight the social and linguistic dimensions of trustworthy multilingual NLP and point to future work in culturally aware, multilingual model design.
Abstract
Multilingual LLMs often have knowledge disparities across languages, with larger gaps in under-resourced languages. Teaching LLMs to abstain in the face of knowledge gaps is thus a promising strategy to mitigate hallucinations in multilingual settings. However, previous studies on LLM abstention primarily focus on English; we find that directly applying existing solutions beyond English results in up to 20.5% performance gaps between high and low-resource languages, potentially due to LLMs' drop in calibration and reasoning beyond a few resource-rich languages. To this end, we propose strategies to enhance LLM abstention by learning from multilingual feedback, where LLMs self-reflect on proposed answers in one language by generating multiple feedback items in related languages: we show that this helps identifying the knowledge gaps across diverse languages, cultures, and communities. Extensive experiments demonstrate that our multilingual feedback approach outperforms various strong baselines, achieving up to 9.2% improvement for low-resource languages across three black-box and open models on three datasets, featuring open-book, closed-book, and commonsense QA. Further analysis reveals that multilingual feedback is both an effective and a more equitable abstain strategy to serve diverse language speakers, and cultural factors have great impact on language selection and LLM abstention behavior, highlighting future directions for multilingual and multi-cultural reliable language modeling.
