Table of Contents
Fetching ...

Pragmatics Meets Culture: Culturally-adapted Artwork Description Generation and Evaluation

Lingjun Zhao, Dayeon Ki, Marine Carpuat, Hal Daumé

Abstract

Language models are known to exhibit various forms of cultural bias in decision-making tasks, yet much less is known about their degree of cultural familiarity in open-ended text generation tasks. In this paper, we introduce the task of culturally-adapted art description generation, where models describe artworks for audiences from different cultural groups who vary in their familiarity with the cultural symbols and narratives embedded in the artwork. To evaluate cultural competence in this pragmatic generation task, we propose a framework based on culturally grounded question answering. We find that base models are only marginally adequate for this task, but, through a pragmatic speaker model, we can improve simulated listener comprehension by up to 8.2%. A human study further confirms that the model with higher pragmatic competence is rated as more helpful for comprehension by 8.0%.

Pragmatics Meets Culture: Culturally-adapted Artwork Description Generation and Evaluation

Abstract

Language models are known to exhibit various forms of cultural bias in decision-making tasks, yet much less is known about their degree of cultural familiarity in open-ended text generation tasks. In this paper, we introduce the task of culturally-adapted art description generation, where models describe artworks for audiences from different cultural groups who vary in their familiarity with the cultural symbols and narratives embedded in the artwork. To evaluate cultural competence in this pragmatic generation task, we propose a framework based on culturally grounded question answering. We find that base models are only marginally adequate for this task, but, through a pragmatic speaker model, we can improve simulated listener comprehension by up to 8.2%. A human study further confirms that the model with higher pragmatic competence is rated as more helpful for comprehension by 8.0%.

Paper Structure

This paper contains 36 sections, 5 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Example of a Gemma3-generated description of artwork for audiences unfamiliar with the cultural context. The description fails to explain the symbolism of the lotus leaf and therefore lacks evidence to help them answer the culturally-attuned question.
  • Figure 2: Our approach uses a self-improving speaker model to generate pragmatic artwork descriptions. Given an artwork and the listener's cultural group, the pragmatic speaker first samples multiple descriptions, then ranks them by simulating how the listener would answer culturally-attuned questions when provided with each description. The model then selects the description with the highest self‑evaluation score and presents it to the listener, either an external simulated listener or human listener.
  • Figure 3: Human users preference rates for descriptions generated by the base speaker and the pragmatic speaker. Takeaway: The pragmatic speaker improves user comprehension and introduces new information, but insufficiently accounts for users' existing knowledge.
  • Figure 4: The pragmatic speaker outperforms the base speaker in question-answering under an external simulated listener, both when (a) the listener aligns with human understanding and when (b) it does not. Colors denote semantically similar sentences.
  • Figure 5: Screenshot of the human evaluation pipeline. Each participant first reads the task instructions and important notes.
  • ...and 4 more figures