Table of Contents
Fetching ...

IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces

Fajri Koto, Rahmad Mahendra, Nurul Aisyah, Timothy Baldwin

TL;DR

IndoCulture tackles Anglocentric bias in commonsense reasoning by building a culturally grounded benchmark across eleven Indonesian provinces. The dataset contains 2,429 Indonesian-language sentence-completion items on 12 topics, authored by local experts and validated through a two-stage quality-control process. Twenty-seven language models, spanning open-weight multilingual, Indonesian-centric, and closed-weight categories, are evaluated to study how geographical context affects reasoning, with large models benefiting noticeably from province-level context. Findings reveal substantial provincial variation and translation challenges, underscoring the need for regionally diverse, locally grounded data to advance culturally aware NLP in Indonesian contexts.

Abstract

Although commonsense reasoning is greatly shaped by cultural and geographical factors, previous studies have predominantly centered on cultures grounded in the English language, potentially resulting in an Anglocentric bias. In this paper, we introduce IndoCulture, aimed at understanding the influence of geographical factors on language model reasoning ability, with a specific emphasis on the diverse cultures found within eleven Indonesian provinces. In contrast to prior work that has relied on templates (Yin et al., 2022) and online scrapping (Fung et al., 2024), we create IndoCulture by asking local people to manually develop a cultural context and plausible options, across a set of predefined topics. Evaluation of 27 language models reveals several insights: (1) the open-weight Llama-3 is competitive with GPT-4, while other open-weight models struggle, with accuracies below 50%; (2) there is a general pattern of models generally performing better for some provinces, such as Bali and West Java, and less well for others; and (3) the inclusion of location context enhances performance, especially for larger models like GPT-4, emphasizing the significance of geographical context in commonsense reasoning.

IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces

TL;DR

IndoCulture tackles Anglocentric bias in commonsense reasoning by building a culturally grounded benchmark across eleven Indonesian provinces. The dataset contains 2,429 Indonesian-language sentence-completion items on 12 topics, authored by local experts and validated through a two-stage quality-control process. Twenty-seven language models, spanning open-weight multilingual, Indonesian-centric, and closed-weight categories, are evaluated to study how geographical context affects reasoning, with large models benefiting noticeably from province-level context. Findings reveal substantial provincial variation and translation challenges, underscoring the need for regionally diverse, locally grounded data to advance culturally aware NLP in Indonesian contexts.

Abstract

Although commonsense reasoning is greatly shaped by cultural and geographical factors, previous studies have predominantly centered on cultures grounded in the English language, potentially resulting in an Anglocentric bias. In this paper, we introduce IndoCulture, aimed at understanding the influence of geographical factors on language model reasoning ability, with a specific emphasis on the diverse cultures found within eleven Indonesian provinces. In contrast to prior work that has relied on templates (Yin et al., 2022) and online scrapping (Fung et al., 2024), we create IndoCulture by asking local people to manually develop a cultural context and plausible options, across a set of predefined topics. Evaluation of 27 language models reveals several insights: (1) the open-weight Llama-3 is competitive with GPT-4, while other open-weight models struggle, with accuracies below 50%; (2) there is a general pattern of models generally performing better for some provinces, such as Bali and West Java, and less well for others; and (3) the inclusion of location context enhances performance, especially for larger models like GPT-4, emphasizing the significance of geographical context in commonsense reasoning.
Paper Structure (28 sections, 1 equation, 5 figures, 7 tables)

This paper contains 28 sections, 1 equation, 5 figures, 7 tables.

Figures (5)

  • Figure 1: IndoCulture covers eleven provinces spanning from eastern to western Indonesia. The highlighted regions in the map represent the provinces examined in IndoCulture. We present examples from Aceh, North Sumatra, and Papua, with three plausible options and correct answers indicated in bold. English translations are provided for illustrative purposes.
  • Figure 2: Topic distribution in IndoCulture.
  • Figure 3: Templates for sentence completion and multiple-choice questions prompts.
  • Figure 4: Performance comparison between Merak (7B), Llama--3 Instruct (70B), and GPT--4 based on text generation output. "Answer (T)" indicates that the generated answer is true, while "Exp(F)" denotes that the answer explanation is false.
  • Figure 5: The accuracy of Indonesian and English translations across BLOOMZ (7B), mT0$_\text{xxl}$ (13B), Llama--2 chat (13B), Llama--3 Instruct (70B), Merak (7B), GPT--3.5, and GPT--4.