GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization

Moriya Dechtiar; Daniel Martin Katz; Mari Sundaresan; Sylvain Jaume; Hongming Wang

GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization

Moriya Dechtiar, Daniel Martin Katz, Mari Sundaresan, Sylvain Jaume, Hongming Wang

TL;DR

The paper tackles the challenge of contract analysis by converting contracts into structured graphs and applying a reinforcement learning aided LLM framework (GRPO) to segment, extract entities and relationships, and reason over clause dependencies. It introduces a legal graph ontology, a contract linter powered by graph metrics, and a full GRAPH-GRPO-LEX NLP pipeline that integrates pre-processing, auto labeling, supervised fine-tuning, and gated GRPO training. A 1600-clause dataset derived from 43 CUAD contracts is used to demonstrate the approach and quantify complexity via metrics such as density, dependency depth, and articulation points. The work provides a practical contract-to-graph system with a gated learning paradigm that improves extraction fidelity and enables graph-based contract analysis and linting with potential for automated drafting support and risk assessment.

Abstract

Contracts are complex documents featuring detailed formal structures, explicit and implicit dependencies and rich semantic content. Given these document properties, contract drafting and manual examination of contracts have proven to be both arduous and susceptible to errors. This work aims to simplify and automate the task of contract review and analysis using a novel framework for transforming legal contracts into structured semantic graphs, enabling computational analysis and data-driven insights. We introduce a detailed ontology mapping core legal contract elements to their graph-theoretic equivalents of nodes and edges. We then present a reinforcement learning based Large Language Model (LLM) framework for segmentation and extraction of entities and relationships from contracts. Our method, GRAPH-GRPO-LEX, incorporates both LLMs and reinforcement learning with group relative policy optimization (GRPO). By applying a carefully drafted reward function of graph metrics, we demonstrate the ability to automatically identify direct relationships between clauses, and even uncover hidden dependencies. Our introduction of the gated GRPO approach shows a strong learning signal and can move contract analysis from a linear, manual reading process to an easily visualized graph. This allows for a more dynamic analysis, including building the groundwork for contract linting similar to what is now practiced in software engineering.

GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization

TL;DR

Abstract

GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)