Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs
Shaojie Zhu, Zhaobin Wang, Chengxiang Zhuo, Hui Lu, Bo Hu, Zang Li
TL;DR
This work targets the gap in Chinese mathematical reasoning for open-source LLMs by introducing Olapa-MCoT, a llama2-13B-based model fine-tuned through supervised learning (SFT) and a specialized alignment stage. The alignment employs SimRRHF, a single-model variant combining ranking, SFT, and similarity losses, and incorporates IDRL to reuse incorrect inferences for stronger learning of difficult concepts. Empirical results show Olapa-MCoT achieving up to fifty percent Chinese mathematical reasoning accuracy, a substantial improvement over the base model and competitive with other open-source systems, while English reasoning also improves, and convergence is more stable. The approach emphasizes efficiency via QLoRA, open-source data, and a compact alignment loop, offering a practical path for deploying specialized Chinese mathematical reasoning LLMs with reduced compute and data requirements.
Abstract
CoT (Chain-of-Thought) is a way to solve reasoning problems for LLMs . Recently, many researches appear for improving the CoT capability of LLMs. In this work, we also proposed Olapa-MCoT, which is a LLMs based on llama2-13B PLM for finetuning and alignment learning. During the alignment training, we proposed the SimRRHF algorithm and Incorrect Data Relearning and mainly focused on optimizing the Chinese mathematical reasoning ability of Olapa-MCoT. The experiment achieved significant results, with the accuracy of Chinese mathematical reasoning up to 50%, 36% rise compared to llama2-13B. In addition, the accuracy of English reasoning ability also increased by nearly 4%.
