Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

Koshiro Saito; Sakae Mizuki; Masanari Ohi; Taishi Nakamura; Taihei Shiotani; Koki Maeda; Youmi Ma; Kakeru Hattori; Kazuki Fujii; Takumi Okamoto; Shigeki Ishida; Hiroya Takamura; Rio Yokota; Naoaki Okazaki

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

Koshiro Saito, Sakae Mizuki, Masanari Ohi, Taishi Nakamura, Taihei Shiotani, Koki Maeda, Youmi Ma, Kakeru Hattori, Kazuki Fujii, Takumi Okamoto, Shigeki Ishida, Hiroya Takamura, Rio Yokota, Naoaki Okazaki

TL;DR

The study investigates why local LLMs are built by assessing 35 Japanese, English, and multilingual models across 19 benchmarks in Japanese and English, applying correlation analysis and PCA to extract language- and task-ability factors. It finds robust cross-lingual transfer for academic subjects, arithmetic reasoning, and code generation, indicating that English training can improve these abilities in Japanese, while Japanese training enhances Japanese knowledge QA and English–Japanese translation, reflecting language-specific gains. The authors show that ability factors map to distinct scaling laws: PC1 (general ability) scales with the English budget, while PC2 (Japanese knowledge/translation) scales with the Japanese budget, with PC3 showing multilingual capabilities but lacking a clear scaling law. The results suggest that the main advantage of local LLMs lies in acquiring local knowledge and translation capabilities, offering practical guidance for designing non-English LLMs and highlighting directions for extending the analysis to other languages and tasks. The computational budgets are framed using $C \approx 6ND$, connecting model size, data, and observed abilities to scaling trends, underscoring practical implications for budgeting local-language pre-training.

Abstract

Why do we build local large language models (LLMs)? What should a local LLM learn from the target language? Which abilities can be transferred from other languages? Do language-specific scaling laws exist? To explore these research questions, we evaluated 35 Japanese, English, and multilingual LLMs on 19 evaluation benchmarks for Japanese and English, taking Japanese as a local language. Adopting an observational approach, we analyzed correlations of benchmark scores, and conducted principal component analysis (PCA) on the scores to derive \textit{ability factors} of local LLMs. We found that training on English text can improve the scores of academic subjects in Japanese (JMMLU). In addition, it is unnecessary to specifically train on Japanese text to enhance abilities for solving Japanese code generation, arithmetic reasoning, commonsense, and reading comprehension tasks. In contrast, training on Japanese text could improve question-answering tasks about Japanese knowledge and English-Japanese translation, which indicates that abilities for solving these two tasks can be regarded as \textit{Japanese abilities} for LLMs. Furthermore, we confirmed that the Japanese abilities scale with the computational budget for Japanese text.

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

TL;DR

Abstract

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)