When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms

Adib Sakhawat; Shamim Ara Parveen; Md Ruhul Amin; Shamim Al Mahmud; Md Saiful Islam; Tahera Khatun

When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms

Adib Sakhawat, Shamim Ara Parveen, Md Ruhul Amin, Shamim Al Mahmud, Md Saiful Islam, Tahera Khatun

Abstract

Figurative language understanding remains a significant challenge for Large Language Models (LLMs), especially for low-resource languages. To address this, we introduce a new idiom dataset, a large-scale, culturally-grounded corpus of 10,361 Bengali idioms. Each idiom is annotated under a comprehensive 19-field schema, established and refined through a deliberative expert consensus process, that captures its semantic, syntactic, cultural, and religious dimensions, providing a rich, structured resource for computational linguistics. To establish a robust benchmark for Bangla figurative language understanding, we evaluate 30 state-of-the-art multilingual and instruction-tuned LLMs on the task of inferring figurative meaning. Our results reveal a critical performance gap, with no model surpassing 50% accuracy, a stark contrast to significantly higher human performance (83.4%). This underscores the limitations of existing models in cross-linguistic and cultural reasoning. By releasing the new idiom dataset and benchmark, we provide foundational infrastructure for advancing figurative language understanding and cultural grounding in LLMs for Bengali and other low-resource languages.

When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms

Abstract

Paper Structure (19 sections, 4 figures, 5 tables)

This paper contains 19 sections, 4 figures, 5 tables.

Introduction
Related Work
Methodology
Dataset Architecture and Schema
Corpus Construction and Annotation Protocol
Experimental Evaluation: Comprehensive LLM Benchmarking
Human Baseline
Dataset Description
Corpus Statistics and Semantic Coverage
Sentiment and Domain Distribution
Geographical and Cultural Coverage
Frequency and Thematic Analysis
Register and Annotation Quality
Model Performance Analysis
Performance Stratification and Variance
...and 4 more sections

Figures (4)

Figure 1: Conceptual illustration contrasting human and AI interpretations of idioms (‘kicked the bucket’).
Figure 2: The complete pipeline of the new idiom dataset construction, annotation, and evaluation process.
Figure 3: Example interaction with an LLM
Figure 4: Dataset Geographical and Cultural Coverage

When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms

Abstract

When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms

Authors

Abstract

Table of Contents

Figures (4)