Table of Contents
Fetching ...

AI Diffusion in Low Resource Language Countries

Amit Misra, Syed Waqas Zamir, Wassim Hamidouche, Inbal Becker-Reshef, Juan Lavista Ferres

TL;DR

This paper investigates the role of language-resource availability in shaping AI diffusion across countries. It integrates a FineWeb2-based language-resource taxonomy with country-level adoption data and uses a weighted fractional logit GLM targeting the Average Treatment Effect on the Treated (ATT), complemented by IPW, AIPW, OLS, and a two-period AIPW‑DiD to isolate language effects. Key results show LRLCs have AI usage substantially lower than non-LRLCs in raw terms, and after covariate adjustment the gap is about $2.1$ percentage points in 2025, i.e., a relative reduction of roughly $0.20$ (≈20%). The study highlights linguistic accessibility as a distinct barrier to inclusive AI diffusion and argues for building high-quality multilingual training data to close this gap, while noting limitations such as multilingualism, within-country heterogeneity, and the short observation window.

Abstract

Artificial intelligence (AI) is diffusing globally at unprecedented speed, but adoption remains uneven. Frontier Large Language Models (LLMs) are known to perform poorly on low-resource languages due to data scarcity. We hypothesize that this performance deficit reduces the utility of AI, thereby slowing adoption in Low-Resource Language Countries (LRLCs). To test this, we use a weighted regression model to isolate the language effect from socioeconomic and demographic factors, finding that LRLCs have a share of AI users that is approximately 20% lower relative to their baseline. These results indicate that linguistic accessibility is a significant, independent barrier to equitable AI diffusion.

AI Diffusion in Low Resource Language Countries

TL;DR

This paper investigates the role of language-resource availability in shaping AI diffusion across countries. It integrates a FineWeb2-based language-resource taxonomy with country-level adoption data and uses a weighted fractional logit GLM targeting the Average Treatment Effect on the Treated (ATT), complemented by IPW, AIPW, OLS, and a two-period AIPW‑DiD to isolate language effects. Key results show LRLCs have AI usage substantially lower than non-LRLCs in raw terms, and after covariate adjustment the gap is about percentage points in 2025, i.e., a relative reduction of roughly (≈20%). The study highlights linguistic accessibility as a distinct barrier to inclusive AI diffusion and argues for building high-quality multilingual training data to close this gap, while noting limitations such as multilingualism, within-country heterogeneity, and the short observation window.

Abstract

Artificial intelligence (AI) is diffusing globally at unprecedented speed, but adoption remains uneven. Frontier Large Language Models (LLMs) are known to perform poorly on low-resource languages due to data scarcity. We hypothesize that this performance deficit reduces the utility of AI, thereby slowing adoption in Low-Resource Language Countries (LRLCs). To test this, we use a weighted regression model to isolate the language effect from socioeconomic and demographic factors, finding that LRLCs have a share of AI users that is approximately 20% lower relative to their baseline. These results indicate that linguistic accessibility is a significant, independent barrier to equitable AI diffusion.

Paper Structure

This paper contains 8 sections, 3 tables.