Table of Contents
Fetching ...

Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation

Siyuan Li, Jian Chen, Rui Yao, Xuming Hu, Peilin Zhou, Weihua Qiu, Simin Zhang, Chucheng Dong, Zhiyao Li, Qipeng Xie, Zixuan Yuan

TL;DR

This work targets the challenge of financial regulatory compliance in China by introducing Compliance-to-Code, a large Chinese dataset that structures regulatory text into executable coding units and relationships. It pairs the dataset with FinCheck, a modular pipeline that converts natural-language regulations into auditable Python checks via a Structure Predictor, a specialized Code Generator, an Information Retriever, and a Report Generator. The results show substantial gains from structured problem decomposition and reasoning, achieving up to 78.1% Pass@1 with gold structures and reasoning steps, along with high code auditability (92% of code with traceable commentary). The contributions provide a valuable resource and practical pathway toward safer, scalable RegTech solutions, while acknowledging challenges in upstream structure accuracy, multi-CU reasoning, and domain adaptation.

Abstract

Nowadays, regulatory compliance has become a cornerstone of corporate governance, ensuring adherence to systematic legal frameworks. At its core, financial regulations often comprise highly intricate provisions, layered logical structures, and numerous exceptions, which inevitably result in labor-intensive or comprehension challenges. To mitigate this, recent Regulatory Technology (RegTech) and Large Language Models (LLMs) have gained significant attention in automating the conversion of regulatory text into executable compliance logic. However, their performance remains suboptimal particularly when applied to Chinese-language financial regulations, due to three key limitations: (1) incomplete domain-specific knowledge representation, (2) insufficient hierarchical reasoning capabilities, and (3) failure to maintain temporal and logical coherence. One promising solution is to develop a domain specific and code-oriented datasets for model training. Existing datasets such as LexGLUE, LegalBench, and CODE-ACCORD are often English-focused, domain-mismatched, or lack fine-grained granularity for compliance code generation. To fill these gaps, we present Compliance-to-Code, the first large-scale Chinese dataset dedicated to financial regulatory compliance. Covering 1,159 annotated clauses from 361 regulations across ten categories, each clause is modularly structured with four logical elements-subject, condition, constraint, and contextual information-along with regulation relations. We provide deterministic Python code mappings, detailed code reasoning, and code explanations to facilitate automated auditing. To demonstrate utility, we present FinCheck: a pipeline for regulation structuring, code generation, and report generation.

Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation

TL;DR

This work targets the challenge of financial regulatory compliance in China by introducing Compliance-to-Code, a large Chinese dataset that structures regulatory text into executable coding units and relationships. It pairs the dataset with FinCheck, a modular pipeline that converts natural-language regulations into auditable Python checks via a Structure Predictor, a specialized Code Generator, an Information Retriever, and a Report Generator. The results show substantial gains from structured problem decomposition and reasoning, achieving up to 78.1% Pass@1 with gold structures and reasoning steps, along with high code auditability (92% of code with traceable commentary). The contributions provide a valuable resource and practical pathway toward safer, scalable RegTech solutions, while acknowledging challenges in upstream structure accuracy, multi-CU reasoning, and domain adaptation.

Abstract

Nowadays, regulatory compliance has become a cornerstone of corporate governance, ensuring adherence to systematic legal frameworks. At its core, financial regulations often comprise highly intricate provisions, layered logical structures, and numerous exceptions, which inevitably result in labor-intensive or comprehension challenges. To mitigate this, recent Regulatory Technology (RegTech) and Large Language Models (LLMs) have gained significant attention in automating the conversion of regulatory text into executable compliance logic. However, their performance remains suboptimal particularly when applied to Chinese-language financial regulations, due to three key limitations: (1) incomplete domain-specific knowledge representation, (2) insufficient hierarchical reasoning capabilities, and (3) failure to maintain temporal and logical coherence. One promising solution is to develop a domain specific and code-oriented datasets for model training. Existing datasets such as LexGLUE, LegalBench, and CODE-ACCORD are often English-focused, domain-mismatched, or lack fine-grained granularity for compliance code generation. To fill these gaps, we present Compliance-to-Code, the first large-scale Chinese dataset dedicated to financial regulatory compliance. Covering 1,159 annotated clauses from 361 regulations across ten categories, each clause is modularly structured with four logical elements-subject, condition, constraint, and contextual information-along with regulation relations. We provide deterministic Python code mappings, detailed code reasoning, and code explanations to facilitate automated auditing. To demonstrate utility, we present FinCheck: a pipeline for regulation structuring, code generation, and report generation.

Paper Structure

This paper contains 33 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of error in LLM-generated compliance logic. Code in red color means error.
  • Figure 2: Distribution of Dataset Difficulty Levels, Average Number of Compliance Unit in One Clause and Inter-Unit Relation Types.
  • Figure 3: FinCheck Compliance Checking Pipeline. The framework processes natural language regulations by first using a Structure Predictor to extract key compliance units. These units are then fed into a Code Generator to create verification code. For a specific case, user input triggers an Information Retriever to fetch relevant company data. The generated code is executed with this data, and a Report Generator summarizes the outcome before the final verification result is shown to the user.
  • Figure 4: Prompt of Code Generation.