Table of Contents
Fetching ...

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

Hao Yu, Jesujoba O. Alabi, Andiswa Bukula, Jian Yun Zhuang, En-Shiun Annie Lee, Tadesse Kebede Guge, Israel Abebe Azime, Happy Buzaaba, Blessing Kudzaishe Sibanda, Godson K. Kalipe, Jonathan Mukiibi, Salomon Kabongo Kabenamualu, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Shamsuddeen Hassan Muhammad, Salomey Osei, Sokhar Samb, Juliet W. Murage, Dietrich Klakow, David Ifeoluwa Adelani

TL;DR

Injongo delivers the first large-scale multicultural ID-SF benchmark for 16 African languages (plus English), addressing Western-centric biases by grounding utterances in African contexts across five domains. The authors rigorously curate data through culturally aligned utterance generation, extensive slot-filling annotation, and quality-control merging, then benchmark both supervised multilingual models and prompting LLMs, revealing strong gains from targeted multilingual pretraining but persistent gaps for LLMs, especially in slot filling. Key findings show AfroXLMR-76L achieving top average performance (ID ≈ 93.7%, SF ≈ 85.6%), while LLMs struggle (e.g., GPT-4o SF ~33.3%), though few-shot prompts can close the gap modestly. The work demonstrates valuable cross-lingual transfer benefits when combining multicultural African data with English, and provides an open, extensible resource to accelerate culturally aware NLU development for African languages.

Abstract

Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks often exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1-score. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls behind the fine-tuning baselines. Compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that the performance of LLMs is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

TL;DR

Injongo delivers the first large-scale multicultural ID-SF benchmark for 16 African languages (plus English), addressing Western-centric biases by grounding utterances in African contexts across five domains. The authors rigorously curate data through culturally aligned utterance generation, extensive slot-filling annotation, and quality-control merging, then benchmark both supervised multilingual models and prompting LLMs, revealing strong gains from targeted multilingual pretraining but persistent gaps for LLMs, especially in slot filling. Key findings show AfroXLMR-76L achieving top average performance (ID ≈ 93.7%, SF ≈ 85.6%), while LLMs struggle (e.g., GPT-4o SF ~33.3%), though few-shot prompts can close the gap modestly. The work demonstrates valuable cross-lingual transfer benefits when combining multicultural African data with English, and provides an open, extensible resource to accelerate culturally aware NLU development for African languages.

Abstract

Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks often exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1-score. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls behind the fine-tuning baselines. Compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that the performance of LLMs is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.

Paper Structure

This paper contains 42 sections, 4 figures, 15 tables.

Figures (4)

  • Figure 1: Task description for Injongo dataset. An example from one of the five domains. It shows the semantic-similar sentences along with intent and slot-filling labels.
  • Figure 2: The distribution of slot entities appearances of all 16 African languages with Unreviewed and Reviewed versions. The slot entities are sorted from left to right by frequency in descending order.
  • Figure 3: Performance of cross-lingual transfer across different shot settings and supervised fine-tuning (SFT) on the merged 17 languages Injongo dataset.
  • Figure 4: Cross-lingual transfer results from CLINC and Injongo English data