Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training
Junxiao Liu, Zhijun Wang, Yixiao Li, Zhejian Lai, Liqian Huang, Xin Huang, Xue Han, Junlan Feng, Shujian Huang
TL;DR
This work tackles the gap between reasoning and understanding in multilingual models by introducing TRIT, a self-improving reinforcement learning framework that jointly trains translation and multilingual reasoning without external data. TRIT operates in two stages: first strengthening cross-lingual reasoning with an accuracy-based filter, then translating English questions into the target language and training reasoning in that language, with a closed-loop feedback between translation quality and reasoning performance. Across MMATH and multiple backbone models, TRIT yields meaningful gains in multilingual reasoning quality and almost perfect language consistency, while also improving translation quality and cross-lingual alignment, with positive spillovers to general-domain text (FLORES-200). The results demonstrate that translation-trained representations and question-level alignment can robustly boost multilingual reasoning, offering a scalable path for robust cross-language mathematical and general-domain problem solving.
Abstract
Long reasoning models often struggle in multilingual settings: they tend to reason in English for non-English questions; when constrained to reasoning in the question language, accuracies drop substantially. The struggle is caused by the limited abilities for both multilingual question understanding and multilingual reasoning. To address both problems, we propose TRIT (Translation-Reasoning Integrated Training), a self-improving framework that integrates the training of translation into multilingual reasoning. Without external feedback or additional multilingual data, our method jointly enhances multilingual question understanding and response generation. On MMATH, our method outperforms multiple baselines by an average of 7 percentage points, improving both answer correctness and language consistency. Further analysis reveals that integrating translation training improves cross-lingual question alignment by over 10 percentage points and enhances translation quality for both mathematical questions and general-domain text, with gains up to 8.4 COMET points on FLORES-200.
