Table of Contents
Fetching ...

Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages

Shreyan Biswas, Alexander Erlei, Ujwal Gadiraju

TL;DR

The paper investigates how multilingual LLMs with language-dependent performance affect user reliance and the persuasiveness of charity advertisements produced through AI-assisted co-writing. It uses two pre-registered experiments in English and Spanish, employing an ABScribe-based workflow to measure how exposure in one language biases usage in the other and to assess downstream donation outcomes and donor beliefs. The results show a clear violation of the independence of choice: prior exposure to Spanish LLMs reduces subsequent English LLM use, while donations remain largely unaffected by ad type, though beliefs about AI involvement significantly reduce donations, especially among Spanish-speaking women. These findings have practical implications for deploying multilingual AI assistants, highlighting potential second-order effects on uptake, equity, and donor behavior, and they emphasize the importance of transparent design and user education to mitigate biased generalizations across languages.

Abstract

Recent advances in generative AI have precipitated a proliferation of novel writing assistants. These systems typically rely on multilingual large language models (LLMs), providing globalized workers the ability to revise or create diverse forms of content in different languages. However, there is substantial evidence indicating that the performance of multilingual LLMs varies between languages. Users who employ writing assistance for multiple languages are therefore susceptible to disparate output quality. Importantly, recent research has shown that people tend to generalize algorithmic errors across independent tasks, violating the behavioral axiom of choice independence. In this paper, we analyze whether user utilization of novel writing assistants in a charity advertisement writing task is affected by the AI's performance in a second language. Furthermore, we quantify the extent to which these patterns translate into the persuasiveness of generated charity advertisements, as well as the role of peoples' beliefs about LLM utilization in their donation choices. Our results provide evidence that writers who engage with an LLM-based writing assistant violate choice independence, as prior exposure to a Spanish LLM reduces subsequent utilization of an English LLM. While these patterns do not affect the aggregate persuasiveness of the generated advertisements, people's beliefs about the source of an advertisement (human versus AI) do. In particular, Spanish-speaking female participants who believed that they read an AI-generated advertisement strongly adjusted their donation behavior downwards. Furthermore, people are generally not able to adequately differentiate between human-generated and LLM-generated ads. Our work has important implications for the design, development, integration, and adoption of multilingual LLMs as assistive agents -- particularly in writing tasks.

Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages

TL;DR

The paper investigates how multilingual LLMs with language-dependent performance affect user reliance and the persuasiveness of charity advertisements produced through AI-assisted co-writing. It uses two pre-registered experiments in English and Spanish, employing an ABScribe-based workflow to measure how exposure in one language biases usage in the other and to assess downstream donation outcomes and donor beliefs. The results show a clear violation of the independence of choice: prior exposure to Spanish LLMs reduces subsequent English LLM use, while donations remain largely unaffected by ad type, though beliefs about AI involvement significantly reduce donations, especially among Spanish-speaking women. These findings have practical implications for deploying multilingual AI assistants, highlighting potential second-order effects on uptake, equity, and donor behavior, and they emphasize the importance of transparent design and user education to mitigate biased generalizations across languages.

Abstract

Recent advances in generative AI have precipitated a proliferation of novel writing assistants. These systems typically rely on multilingual large language models (LLMs), providing globalized workers the ability to revise or create diverse forms of content in different languages. However, there is substantial evidence indicating that the performance of multilingual LLMs varies between languages. Users who employ writing assistance for multiple languages are therefore susceptible to disparate output quality. Importantly, recent research has shown that people tend to generalize algorithmic errors across independent tasks, violating the behavioral axiom of choice independence. In this paper, we analyze whether user utilization of novel writing assistants in a charity advertisement writing task is affected by the AI's performance in a second language. Furthermore, we quantify the extent to which these patterns translate into the persuasiveness of generated charity advertisements, as well as the role of peoples' beliefs about LLM utilization in their donation choices. Our results provide evidence that writers who engage with an LLM-based writing assistant violate choice independence, as prior exposure to a Spanish LLM reduces subsequent utilization of an English LLM. While these patterns do not affect the aggregate persuasiveness of the generated advertisements, people's beliefs about the source of an advertisement (human versus AI) do. In particular, Spanish-speaking female participants who believed that they read an AI-generated advertisement strongly adjusted their donation behavior downwards. Furthermore, people are generally not able to adequately differentiate between human-generated and LLM-generated ads. Our work has important implications for the design, development, integration, and adoption of multilingual LLMs as assistive agents -- particularly in writing tasks.

Paper Structure

This paper contains 39 sections, 3 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: The ABScribe writing interface used in the experiment. Participants had access to the instructions (1), task descriptions (2), and the WWF mission statement (3), at any time during their task. When any text was selected, options for (5) "Create Variation" and (6) "Create Continuation" appeared, allowing participants to generate new text chunks or extend the current text. Variations and continuations created through (5) and (6) were displayed in the variation panel (7). AI modifiers could be applied by selecting a variation and clicking on one of the recipe buttons (8).
  • Figure 2: Experiment Workflow for LLM-Assisted Writing in ENG (L1) - ESP (L2) and ESP (L1) - ENG (L2) Conditions. Task Sequence L1 involves completing all subtasks in the first language (L1): (L1.a) GIF-based instructions introducing the tool’s features; (L1.b) Interaction with the writing environment, making use of the tool’s features; (L1.c) A reading comprehension task focused on WWF’s mission and vision; (L1.d) Main writing task in L1; (L1.e) Post-task survey on the writing task in L1.d. Task Sequence L2 begins after Step L1.e, repeating the same subtasks (L2.a $\rightarrow$ L2.b $\rightarrow$ L2.c $\rightarrow$ L2.d $\rightarrow$ L2.e) in the second language (L2). In the No_LLM condition, participants only completed a single task sequence in English (L1).
  • Figure 3: The donation survey screen. Left: Participants first read the donation message and choose their desired donation amount. Right: After selecting the donation amount, participants proceed to answer the survey. The persuasive text remains visible throughout the process.
  • Figure 4: Effect of Initial Language Exposure on AI Drafter Usage by Task Group. Left: Total usage count of the AI drafter feature shows a significant "gap" between task groups based on initial language exposure. The group exposed to English first (ENG_1, followed by ESP_2) shows substantially higher usage compared to the group exposed to Spanish first (ESP_1, followed by ENG_2), as indicated by the significant differences marked with * ($p < 0.05$) and ** ($p < 0.01$). The results highlight that initial exposure to English led to more engagement with the AI feature, whereas starting with Spanish resulted in notably lower engagement in both ESP_1 and ENG_2. Right: The number of unique users, out of a maximum of 16, similarly reflects this trend, with more users engaging with the feature in the ESP_2 task after beginning with English.
  • Figure 5: Weighted Average Similarity Percentage Across Task Groups and Models. The similarity scores vary across task groups depending on the initial language exposure but pattern remains consistent across embedding models. ENG_1 and ESP_2, which involve starting with English, show higher similarity percentages across models compared to ESP_1 and ENG_2, where the initial exposure is in Spanish. This pattern suggests that starting with English may lead to more AI-written text in the final generated content, reflected by higher similarity scores. Each bar colour represents a different embedding model.
  • ...and 9 more figures