Table of Contents
Fetching ...

Between Myths and Metaphors: Rethinking LLMs for SRH in Conservative Contexts

Ameemah Humayun, Bushra Zubair, Maryam Mustafa

TL;DR

This work investigates how indirect SRH communication in conservative, low-resource contexts challenges LLM-based health interventions. It combines a qualitative study in Lahore, Pakistan with an empirical evaluation of five LLMs on patient-derived prompts to map linguistic practice to AI capabilities. The authors introduce a two-axis framework (referential domains and communicative approaches) and a Roman Urdu glossary, revealing significant issues with semantic drift, polysemy, myths, and gestural communication. They argue for architecture that treats miscommunication as the default, supports synchronous terminology management, and enables multimodal, privacy-conscious interactions to advance culturally situated SRH AI design with implications for equity and global health impact.

Abstract

Low-resource countries represent over 90% of maternal deaths, with Pakistan among the top four countries contributing nearly half in 2023. Since these deaths are mostly preventable, large language models (LLMs) can help address this crisis by automating health communication and risk assessment. However, sexual and reproductive health (SRH) communication in conservative contexts often relies on indirect language that obscures meaning, complicating LLM-based interventions. We conduct a two-stage study in Pakistan: (1) analyzing data from clinical observations, interviews, and focus groups with clinicians and patients, and (2) evaluating the interpretive capabilities of five popular LLMs on this data. Our analysis identifies two axes of communication (referential domain and expression approach) and shows LLMs struggle with semantic drift, myths, and polysemy in clinical interactions. We contribute: (1) empirical themes in SRH communication, (2) a categorization framework for indirect communication, (3) evaluation of LLM performance, and (4) design recommendations for culturally-situated SRH communication.

Between Myths and Metaphors: Rethinking LLMs for SRH in Conservative Contexts

TL;DR

This work investigates how indirect SRH communication in conservative, low-resource contexts challenges LLM-based health interventions. It combines a qualitative study in Lahore, Pakistan with an empirical evaluation of five LLMs on patient-derived prompts to map linguistic practice to AI capabilities. The authors introduce a two-axis framework (referential domains and communicative approaches) and a Roman Urdu glossary, revealing significant issues with semantic drift, polysemy, myths, and gestural communication. They argue for architecture that treats miscommunication as the default, supports synchronous terminology management, and enables multimodal, privacy-conscious interactions to advance culturally situated SRH AI design with implications for equity and global health impact.

Abstract

Low-resource countries represent over 90% of maternal deaths, with Pakistan among the top four countries contributing nearly half in 2023. Since these deaths are mostly preventable, large language models (LLMs) can help address this crisis by automating health communication and risk assessment. However, sexual and reproductive health (SRH) communication in conservative contexts often relies on indirect language that obscures meaning, complicating LLM-based interventions. We conduct a two-stage study in Pakistan: (1) analyzing data from clinical observations, interviews, and focus groups with clinicians and patients, and (2) evaluating the interpretive capabilities of five popular LLMs on this data. Our analysis identifies two axes of communication (referential domain and expression approach) and shows LLMs struggle with semantic drift, myths, and polysemy in clinical interactions. We contribute: (1) empirical themes in SRH communication, (2) a categorization framework for indirect communication, (3) evaluation of LLM performance, and (4) design recommendations for culturally-situated SRH communication.

Paper Structure

This paper contains 60 sections, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Design overview of our two-stage study conducted in a charitable hospital in Lahore, Pakistan. Stage 1 (Steps 1–6, Qualitative Data Collection): patient interviews, clinician–patient observations, focus groups, and clinician interviews informed methodological adaptations and qualitative analysis of euphemistic and indirect SRH communication. Stage 2 (Steps 7–10, LLM Performance Evaluation): 71 prompts derived from Stage 1 were tested on five LLMs (LLaMA 3.2, Gemma 3, GPT-OSS, Claude Sonnet 4, GPT-4o). Model responses were analyzed by researchers, rated by a gynecologist for correctness, and validated through follow-up interviews.
  • Figure 2: Categorization framework for communication in sexual and reproductive health (SRH). Decision tree for categorizing sexual and reproductive health (SRH) communication strategies along the Communicative Approach axis. The framework moves from direct, formal terms toward a range of indirect approaches—colloquialisms, euphemisms, figurative expressions, and myths—depending on how a concept is expressed. Each category reflects a different communicative function: convenience (colloquial shorthand), politeness and taboo navigation (euphemism), descriptive compensation (figurative terms), or culturally embedded misconceptions (myths). The diagram shows how everyday language departs from biomedical vocabulary, creating systematic interpretive challenges for both clinicians and LLM-based systems.
  • Figure 3: Communication barriers identified in the pilot study operate at three nested levels: individual (limited recall and communicative constraints), social (taboo enforcement and social isolation), and institutional (education–language gaps). The figure illustrates how these levels overlap, with institutional structures shaping social conditions, and social dynamics constraining individual communicative agency. This layered perspective highlights why SRH communication challenges cannot be solved by technical fixes alone, and why LLM-based interventions must be designed to address barriers across multiple levels.
  • Figure 4: Average correctness scores for five large language models (LLMs) on 71 prompts in the field researcher evaluation. Proprietary models performed much better than open-source models: Claude scored highest (0.82), followed closely by GPT-4o (0.80). The best-performing open-source model, GPT-OSS, reached 0.56, while Gemma and LLaMA scored much lower at 0.32 and 0.16. These results show a clear gap between proprietary and open-source systems.
  • Figure 5: Examples of language script degradation and code-switching in LLM outputs. Gemma (top row) shows mixed Urdu, Hindi, Telugu, and Russian scripts, along with gibberish in the final column. LLaMA (middle row) produces completely garbled Unicode characters in Example 1, coherent Roman Urdu with some Urdu (Arabic script) in Example 2, and mixed Urdu–English–Hindi–Roman Urdu scripts in Example 3. Claude (bottom row) demonstrates occasional Hindi character insertion within otherwise consistent Roman Urdu text. All models were instructed to respond exclusively in Roman Urdu suitable for low-literacy Pakistani patients.
  • ...and 1 more figures