Table of Contents
Fetching ...

End-to-End Bangla AI for Solving Math Olympiad Problem Benchmark: Leveraging Large Language Model Using Integrated Approach

H. M. Shadman Tabib, Jaber Ahmed Deedar

TL;DR

This paper addresses Bangla-language math Olympiad problem solving with LLMs. It presents an end-to-end pipeline that combines model selection, dataset augmentation, retrieval-augmented generation (RAG), and tool-integrated reasoning (TIR) with self-consistency to tackle multi-step math tasks in Bangla. It demonstrates that problem categorization, targeted prompting, and iterative reasoning yield measurable gains, with configurations like Qwen-2.5-32B-Instruct-AWQ reaching $77/100$ on a test set and GPT-4o-based TIR setups excelling on the BDMO benchmark ($125/209$). The work highlights the practical potential of combining large, multilingual LLMs with augmented data and retrieval strategies, and outlines directions for improving retrieval quality and domain-specific datasets.

Abstract

This work introduces systematic approach for enhancing large language models (LLMs) to address Bangla AI mathematical challenges. Through the assessment of diverse LLM configurations, fine-tuning with specific datasets, and the implementation of Retrieval-Augmented Generation (RAG), we enhanced the model's reasoning precision in a multilingual setting. Crucial discoveries indicate that customized prompting, dataset augmentation, and iterative reasoning improve the model's efficiency regarding Olympiad-level mathematical challenges.

End-to-End Bangla AI for Solving Math Olympiad Problem Benchmark: Leveraging Large Language Model Using Integrated Approach

TL;DR

This paper addresses Bangla-language math Olympiad problem solving with LLMs. It presents an end-to-end pipeline that combines model selection, dataset augmentation, retrieval-augmented generation (RAG), and tool-integrated reasoning (TIR) with self-consistency to tackle multi-step math tasks in Bangla. It demonstrates that problem categorization, targeted prompting, and iterative reasoning yield measurable gains, with configurations like Qwen-2.5-32B-Instruct-AWQ reaching on a test set and GPT-4o-based TIR setups excelling on the BDMO benchmark (). The work highlights the practical potential of combining large, multilingual LLMs with augmented data and retrieval strategies, and outlines directions for improving retrieval quality and domain-specific datasets.

Abstract

This work introduces systematic approach for enhancing large language models (LLMs) to address Bangla AI mathematical challenges. Through the assessment of diverse LLM configurations, fine-tuning with specific datasets, and the implementation of Retrieval-Augmented Generation (RAG), we enhanced the model's reasoning precision in a multilingual setting. Crucial discoveries indicate that customized prompting, dataset augmentation, and iterative reasoning improve the model's efficiency regarding Olympiad-level mathematical challenges.
Paper Structure (18 sections, 1 figure, 7 tables)