Reading between the Lines: Can LLMs Identify Cross-Cultural Communication Gaps?
Sougata Saha, Saurabh Kumar Pandey, Harshit Gupta, Monojit Choudhury
TL;DR
This paper investigates cross-cultural gaps in understanding Goodreads reviews by identifying Culture-Specific Items (CSIs) and evaluating GPT-4o as a cultural mediator. It combines a user study with human annotations and GPT-4o-assisted post-processing to extract CSIs and assess model performance. The findings show a substantial prevalence of CSIs (83% of reviews) and modest but equitable GPT-4o performance (recall ~0.65, precision ~0.49), indicating both the existence of cross-cultural readability barriers and the need for stronger cultural mediation tools. The work provides a publicly available dataset and demonstrates a concrete methodology for evaluating and improving AI-assisted cross-cultural communication across domains.
Abstract
In a rapidly globalizing and digital world, content such as book and product reviews created by people from diverse cultures are read and consumed by others from different corners of the world. In this paper, we investigate the extent and patterns of gaps in understandability of book reviews due to the presence of culturally-specific items and elements that might be alien to users from another culture. Our user-study on 57 book reviews from Goodreads reveal that 83\% of the reviews had at least one culture-specific difficult-to-understand element. We also evaluate the efficacy of GPT-4o in identifying such items, given the cultural background of the reader; the results are mixed, implying a significant scope for improvement. Our datasets are available here: https://github.com/sougata-ub/reading_between_lines
