Multilingual Prompting for Improving LLM Generation Diversity
Qihan Wang, Shidong Pan, Tal Linzen, Emily Black
TL;DR
This work identifies the lack of cultural and demographic diversity in LLM generations and proposes multilingual prompting as a principled method to activate culture-specific knowledge across languages. By creating multiple prompt variants in different languages with cultural cues and aggregating their responses, the method yields higher diversity than prior approaches while preserving factual accuracy. Language alignment also reduces culture-specific hallucinations, and the diversity gains scale with the number of languages and vary by model size and resource level. The results suggest multilingual prompting as a practical, scalable technique to elicit broader perspectives from LLMs for diverse and more representative outputs.
Abstract
Large Language Models (LLMs) are known to lack cultural representation and overall diversity in their generations, from expressing opinions to answering factual questions. To mitigate this problem, we propose multilingual prompting: a prompting method which generates several variations of a base prompt with added cultural and linguistic cues from several cultures, generates responses, and then combines the results. Building on evidence that LLMs have language-specific knowledge, multilingual prompting seeks to increase diversity by activating a broader range of cultural knowledge embedded in model training data. Through experiments across multiple models (GPT-4o, GPT-4o-mini, LLaMA 70B, and LLaMA 8B), we show that multilingual prompting consistently outperforms existing diversity-enhancing techniques such as high-temperature sampling, step-by-step recall, and persona prompting. Further analyses show that the benefits of multilingual prompting vary between high and low resource languages and across model sizes, and that aligning the prompting language with cultural cues reduces hallucination about culturally-specific information.
