Table of Contents
Fetching ...

KemenkeuGPT: Leveraging a Large Language Model on Indonesia's Government Financial Data and Regulations to Enhance Decision Making

Gilang Fajar Febrian, Grazziela Figueredo

TL;DR

This work addresses the challenge of extracting actionable insights from Indonesia's government financial data and regulations using a large language model. It develops KemenkeuGPT by combining LangChain with Retrieval-Augmented Generation, prompt engineering, and fine-tuning, and evaluates multiple LLMs through iterative cycles and stakeholder feedback. The results show systematic performance gains, with final accuracy reaching 61% and RAGAS metrics indicating improved faithfulness and usefulness, though the system still relies on partial data and human-in-the-loop updates. The findings suggest that LLM-assisted decision support can augment Ministry of Finance workflows and public services, while highlighting practical constraints and clear paths for future enhancement.

Abstract

Data is crucial for evidence-based policymaking and enhancing public services, including those at the Ministry of Finance of the Republic of Indonesia. However, the complexity and dynamic nature of governmental financial data and regulations can hinder decision-making. This study investigates the potential of Large Language Models (LLMs) to address these challenges, focusing on Indonesia's financial data and regulations. While LLMs are effective in the financial sector, their use in the public sector in Indonesia is unexplored. This study undertakes an iterative process to develop KemenkeuGPT using the LangChain with Retrieval-Augmented Generation (RAG), prompt engineering and fine-tuning. The dataset from 2003 to 2023 was collected from the Ministry of Finance, Statistics Indonesia and the International Monetary Fund (IMF). Surveys and interviews with Ministry officials informed, enhanced and fine-tuned the model. We evaluated the model using human feedback, LLM-based evaluation and benchmarking. The model's accuracy improved from 35% to 61%, with correctness increasing from 48% to 64%. The Retrieval-Augmented Generation Assessment (RAGAS) framework showed that KemenkeuGPT achieved 44% correctness with 73% faithfulness, 40% precision and 60% recall, outperforming several other base models. An interview with an expert from the Ministry of Finance indicated that KemenkeuGPT has the potential to become an essential tool for decision-making. These results are expected to improve with continuous human feedback.

KemenkeuGPT: Leveraging a Large Language Model on Indonesia's Government Financial Data and Regulations to Enhance Decision Making

TL;DR

This work addresses the challenge of extracting actionable insights from Indonesia's government financial data and regulations using a large language model. It develops KemenkeuGPT by combining LangChain with Retrieval-Augmented Generation, prompt engineering, and fine-tuning, and evaluates multiple LLMs through iterative cycles and stakeholder feedback. The results show systematic performance gains, with final accuracy reaching 61% and RAGAS metrics indicating improved faithfulness and usefulness, though the system still relies on partial data and human-in-the-loop updates. The findings suggest that LLM-assisted decision support can augment Ministry of Finance workflows and public services, while highlighting practical constraints and clear paths for future enhancement.

Abstract

Data is crucial for evidence-based policymaking and enhancing public services, including those at the Ministry of Finance of the Republic of Indonesia. However, the complexity and dynamic nature of governmental financial data and regulations can hinder decision-making. This study investigates the potential of Large Language Models (LLMs) to address these challenges, focusing on Indonesia's financial data and regulations. While LLMs are effective in the financial sector, their use in the public sector in Indonesia is unexplored. This study undertakes an iterative process to develop KemenkeuGPT using the LangChain with Retrieval-Augmented Generation (RAG), prompt engineering and fine-tuning. The dataset from 2003 to 2023 was collected from the Ministry of Finance, Statistics Indonesia and the International Monetary Fund (IMF). Surveys and interviews with Ministry officials informed, enhanced and fine-tuned the model. We evaluated the model using human feedback, LLM-based evaluation and benchmarking. The model's accuracy improved from 35% to 61%, with correctness increasing from 48% to 64%. The Retrieval-Augmented Generation Assessment (RAGAS) framework showed that KemenkeuGPT achieved 44% correctness with 73% faithfulness, 40% precision and 60% recall, outperforming several other base models. An interview with an expert from the Ministry of Finance indicated that KemenkeuGPT has the potential to become an essential tool for decision-making. These results are expected to improve with continuous human feedback.
Paper Structure (13 sections, 4 equations, 7 figures, 10 tables)

This paper contains 13 sections, 4 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Iterative development of the research.
  • Figure 2: Retrieval Augmented Generation (RAG) with LangChain.
  • Figure 3: KemenkeuGPT feedback feature for continues improvement.
  • Figure 4: Accuracy of KemenkeuGPT after iterative improvement.
  • Figure 5: Response of KemenkeuGPT after iterative improvement.
  • ...and 2 more figures