Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs Generator
Chengyuan Liu, Shihang Wang, Lizhi Qing, Jun Lin, Ji Zhang, Fei Wu, Kun Kuang
TL;DR
The paper tackles domain-specific calculation problems that require intricate domain knowledge and proposes Knowledge-Intensive Programs as a solution, instantiated via the KIPG pipeline. KIPG generates domain-aware programs, extracts critical variables, and uses program-driven calculations aligned to domain rules through iterative preference optimization, achieving strong results in legal-domain data and good cross-domain transfer to medical data. The approach demonstrates superior performance over baselines across model scales and languages, and analyses highlight the importance of initialization, training, and sufficient exploration in program generation. The work advances practical, knowledge-grounded AI reasoning for calculation tasks, with practical implications for automated legal and regulatory reasoning and potential extension to other knowledge-intensive domains.
Abstract
Domain Large Language Models (LLMs) are developed for domain-specific tasks based on general LLMs. But it still requires professional knowledge to facilitate the expertise for some domain-specific tasks. In this paper, we investigate into knowledge-intensive calculation problems. We find that the math problems to be challenging for LLMs, when involving complex domain-specific rules and knowledge documents, rather than simple formulations of terminologies. Therefore, we propose a pipeline to solve the domain-specific calculation problems with Knowledge-Intensive Programs Generator more effectively, named as KIPG. It generates knowledge-intensive programs according to the domain-specific documents. For each query, key variables are extracted, then outcomes which are dependent on domain knowledge are calculated with the programs. By iterative preference alignment, the code generator learns to improve the logic consistency with the domain knowledge. Taking legal domain as an example, we have conducted experiments to prove the effectiveness of our pipeline, and extensive analysis on the modules. We also find that the code generator is also adaptable to other domains, without training on the new knowledge.
