Table of Contents
Fetching ...

LexPro-1.0 Technical Report

Haotian Chen, Yanyu Xu, Boyan Wang, Chaoyue Zhao, Xiaoyu Han, Fang Wang, Lizhen Cui, Yonghui Xu

TL;DR

LexPro-1.0 targets high-precision Chinese legal reasoning by deploying domain-specific data and a three-pronged learning strategy: supervised fine-tuning on curated legal corpora, reinforcement learning with GRPO to improve readability and reasoning, and retrieval-augmented inference to ground outputs in relevant statutes. It demonstrates substantial gains on basic legal knowledge and legal element extraction after SFT, while RL addresses formatting and interpretability issues, and RAG mitigates input-length challenges. The work compares model variants (14B, 32B, 70B) and shows practical potential for prosecutorial workflows, though challenges like reward design and embedding quality remain. Future directions include expanding legal data sources, building a dedicated legal knowledge base, and broadening tasks such as similar case recommendations to enhance real-world applicability.

Abstract

In this report, we introduce our first-generation reasoning model, LexPro-1.0, a large language model designed for the highly specialized Chinese legal domain, offering comprehensive capabilities to meet diverse realistic needs. Existing legal LLMs face two primary challenges. Firstly, their design and evaluation are predominantly driven by computer science perspectives, leading to insufficient incorporation of legal expertise and logic, which is crucial for high-precision legal applications, such as handling complex prosecutorial tasks. Secondly, these models often underperform due to a lack of comprehensive training data from the legal domain, limiting their ability to effectively address real-world legal scenarios. To address this, we first compile millions of legal documents covering over 20 types of crimes from 31 provinces in China for model training. From the extensive dataset, we further select high-quality for supervised fine-tuning, ensuring enhanced relevance and precision. The model further undergoes large-scale reinforcement learning without additional supervision, emphasizing the enhancement of its reasoning capabilities and explainability. To validate its effectiveness in complex legal applications, we also conduct human evaluations with legal experts. We develop fine-tuned models based on DeepSeek-R1-Distilled versions, available in three dense configurations: 14B, 32B, and 70B.

LexPro-1.0 Technical Report

TL;DR

LexPro-1.0 targets high-precision Chinese legal reasoning by deploying domain-specific data and a three-pronged learning strategy: supervised fine-tuning on curated legal corpora, reinforcement learning with GRPO to improve readability and reasoning, and retrieval-augmented inference to ground outputs in relevant statutes. It demonstrates substantial gains on basic legal knowledge and legal element extraction after SFT, while RL addresses formatting and interpretability issues, and RAG mitigates input-length challenges. The work compares model variants (14B, 32B, 70B) and shows practical potential for prosecutorial workflows, though challenges like reward design and embedding quality remain. Future directions include expanding legal data sources, building a dedicated legal knowledge base, and broadening tasks such as similar case recommendations to enhance real-world applicability.

Abstract

In this report, we introduce our first-generation reasoning model, LexPro-1.0, a large language model designed for the highly specialized Chinese legal domain, offering comprehensive capabilities to meet diverse realistic needs. Existing legal LLMs face two primary challenges. Firstly, their design and evaluation are predominantly driven by computer science perspectives, leading to insufficient incorporation of legal expertise and logic, which is crucial for high-precision legal applications, such as handling complex prosecutorial tasks. Secondly, these models often underperform due to a lack of comprehensive training data from the legal domain, limiting their ability to effectively address real-world legal scenarios. To address this, we first compile millions of legal documents covering over 20 types of crimes from 31 provinces in China for model training. From the extensive dataset, we further select high-quality for supervised fine-tuning, ensuring enhanced relevance and precision. The model further undergoes large-scale reinforcement learning without additional supervision, emphasizing the enhancement of its reasoning capabilities and explainability. To validate its effectiveness in complex legal applications, we also conduct human evaluations with legal experts. We develop fine-tuned models based on DeepSeek-R1-Distilled versions, available in three dense configurations: 14B, 32B, and 70B.

Paper Structure

This paper contains 16 sections, 4 equations, 34 figures, 7 tables.

Figures (34)

  • Figure 1: An example of a processed document: We extract key content from the original (unprocessed) document and structure the essential information into JSON format for subsequent training.
  • Figure 2: We input the same example twice, and the model produced different results each time. This inconsistency arises due to the model's incomplete legal knowledge, making it unable to clearly distinguish between fines and compensation, leading to unstable outputs.
  • Figure 3: The average training progress of 7B, 14B, and 30B models during the SFT process, reporting training loss and gradient norm on the training set.
  • Figure 4: Example of legal extraction task inference results. The base model fails to capture several legal elements, while the fine-tuned model successfully aligns with the ground truth labels.
  • Figure 5: Example of legal extraction task inference results. The base model misses several legal elements and uses inconsistent units (e.g., "15 years" instead of the expected "180 months"). While the fine-tuned model still omits one element (civil litigation), it standardizes the unit representation to months.
  • ...and 29 more figures