FiMI: A Domain-Specific Language Model for Indian Finance Ecosystem
Aboli Kathar, Aman Kumar, Anusha Kamath, Araveeti Srujan, Ashish Sharma, Chandra Bhushan, Dilip Asbe, Divya Sorate, Duddu Prasanth Kumar, Evan Acharya, Harsh Sharma, Hrithik Kadam, Kanishk Singla, Keyur Doshi, Kiran Praveen, Kolisetty Krishna SK, Krishanu Adhikary, Lokesh MPT, Mayurdeep Sonowal, Nadeem Shaikh, Navya Prakash, Nimit Kothari, Nitin Kukreja, Prashant Devadiga, Rakesh Paul, Ratanjeet Pratap Chauhan, Raunak Kalani, Raviraj Joshi, Shamanth MH, Shantanu Pandey, Shubham Soni, Siddharth Dixit, Smriti Jopat, Sunil Patel, Suraj Singh, Suvradip Paul, Tulasi Pilla, Utkarsh Vaidya, Vineeth Nambiar, Vishal Kanvaty, Yatharth Dedhia
TL;DR
FiMI introduces two domain-specific LLMs for India's financial ecosystem, FiMI Base and FiMI Instruct, built atop Mistral Small 24B. The authors deploy a multi-stage training pipeline—Continuous Pre-Training on a large India-focused corpus, followed by Instruction Fine-Tuning and Domain-Supervised Fine-Tuning with synthetic UPI-Help data—to internalize finance workflows, regulatory constraints, and multilingual interactions. They report approximately 20% domain-specific gains and substantial improvements in domain tool-calling precision, while preserving general capabilities similar to larger models. The work demonstrates strong practical impact by enabling NPCI's UPI Help with reliable, compliant, and multilingual support, and outlines a replicable blueprint for domain adaptation in regulated financial settings using synthetic data, tool usage, and safety-focused post-training.
Abstract
We present FiMI (Finance Model for India), a domain-specialized financial language model developed for Indian digital payment systems. We develop two model variants: FiMI Base and FiMI Instruct. FiMI adapts the Mistral Small 24B architecture through a multi-stage training pipeline, beginning with continuous pre-training on 68 Billion tokens of curated financial, multilingual (English, Hindi, Hinglish), and synthetic data. This is followed by instruction fine-tuning and domain-specific supervised fine-tuning focused on multi-turn, tool-driven conversations that model real-world workflows, such as transaction disputes and mandate lifecycle management. Evaluations reveal that FiMI Base achieves a 20% improvement over the Mistral Small 24B Base model on finance reasoning benchmark, while FiMI Instruct outperforms the Mistral Small 24B Instruct model by 87% on domain-specific tool-calling. Moreover, FiMI achieves these significant domain gains while maintaining comparable performance to models of similar size on general benchmarks.
