Table of Contents
Fetching ...

Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

Yiyi Chen, Qiongxiu Li, Russa Biswas, Johannes Bjerva

TL;DR

The paper investigates language confusion in multilingual LLMs and introduces Language Confusion Entropy, $H_{\mathbf{C}}$, to quantify uncertainty across language distributions. It builds language graphs from linguistic typology and uses a KL-divergence framework to relate confusion to language similarities. Empirical analyses on Language Confusion Benchmark (LCB) and Multilingual Textual Embedding Inversion (MTEI) show crosslingual prompts and non-Latin-script languages are particularly prone to confusion, influencing inversion performance and BLEU metrics; findings suggest typology-informed priors can improve LLM alignment and security. The work highlights practical implications for cross-lingual chatbots, translation services, and multilingual safety defenses, while acknowledging limitations from typological data coverage and calling for typology-aware defenses and broader language inclusion.

Abstract

Language Confusion is a phenomenon where Large Language Models (LLMs) generate text that is neither in the desired language, nor in a contextually appropriate language. This phenomenon presents a critical challenge in text generation by LLMs, often appearing as erratic and unpredictable behavior. We hypothesize that there are linguistic regularities to this inherent vulnerability in LLMs and shed light on patterns of language confusion across LLMs. We introduce a novel metric, Language Confusion Entropy, designed to directly measure and quantify this confusion, based on language distributions informed by linguistic typology and lexical variation. Comprehensive comparisons with the Language Confusion Benchmark (Marchisio et al., 2024) confirm the effectiveness of our metric, revealing patterns of language confusion across LLMs. We further link language confusion to LLM security, and find patterns in the case of multilingual embedding inversion attacks. Our analysis demonstrates that linguistic typology offers theoretically grounded interpretation, and valuable insights into leveraging language similarities as a prior for LLM alignment and security.

Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

TL;DR

The paper investigates language confusion in multilingual LLMs and introduces Language Confusion Entropy, , to quantify uncertainty across language distributions. It builds language graphs from linguistic typology and uses a KL-divergence framework to relate confusion to language similarities. Empirical analyses on Language Confusion Benchmark (LCB) and Multilingual Textual Embedding Inversion (MTEI) show crosslingual prompts and non-Latin-script languages are particularly prone to confusion, influencing inversion performance and BLEU metrics; findings suggest typology-informed priors can improve LLM alignment and security. The work highlights practical implications for cross-lingual chatbots, translation services, and multilingual safety defenses, while acknowledging limitations from typological data coverage and calling for typology-aware defenses and broader language inclusion.

Abstract

Language Confusion is a phenomenon where Large Language Models (LLMs) generate text that is neither in the desired language, nor in a contextually appropriate language. This phenomenon presents a critical challenge in text generation by LLMs, often appearing as erratic and unpredictable behavior. We hypothesize that there are linguistic regularities to this inherent vulnerability in LLMs and shed light on patterns of language confusion across LLMs. We introduce a novel metric, Language Confusion Entropy, designed to directly measure and quantify this confusion, based on language distributions informed by linguistic typology and lexical variation. Comprehensive comparisons with the Language Confusion Benchmark (Marchisio et al., 2024) confirm the effectiveness of our metric, revealing patterns of language confusion across LLMs. We further link language confusion to LLM security, and find patterns in the case of multilingual embedding inversion attacks. Our analysis demonstrates that linguistic typology offers theoretically grounded interpretation, and valuable insights into leveraging language similarities as a prior for LLM alignment and security.

Paper Structure

This paper contains 33 sections, 4 equations, 6 figures, 15 tables, 1 algorithm.

Figures (6)

  • Figure 1: The use of proposed metric to quantify language confusion and its correlation with language similarity through KL divergence.
  • Figure 2: Language confusion for LCB by each language across LLMs for crosslingual setting at Line level. The languages are ordered ascendingly by their language confusion entropy averaged across LLMs.
  • Figure 3: Language confusion entropy for LCB across generation settings by LLMs at line and word Level.
  • Figure 4: Language confusion for LCB across data sources at line level for crosslingual setting.
  • Figure 5: Language Confusion for LCB across generation settings by data sources at line and word Level.
  • ...and 1 more figures