GraphCompliance: Aligning Policy and Context Graphs for LLM-Based Regulatory Compliance
Jiseong Chung, Ronny Ko, Wonchul Yoo, Makoto Onizuka, Sungmok Kim, Tae-Wan Kim, Won-Yong Shin
TL;DR
GraphCompliance tackles web-scale regulatory compliance by aligning unstructured runtime contexts with normative regulations through dual graphs. It constructs a Policy Graph from statutes and a Context Graph from events, then uses a Compliance Gate to perform deterministic structural analysis before a constrained LLM-based judgment. Empirical results on a GDPR-based GCS-300 benchmark show consistent improvements in micro-F1 and especially F2, with ablations confirming the contribution of each graph component and the gating mechanism. The approach yields higher recall and lower false positives, improving verifiability and enabling more reliable normative reasoning in regulatory automation.
Abstract
Compliance at web scale poses practical challenges: each request may require a regulatory assessment. Regulatory texts (e.g., the General Data Protection Regulation, GDPR) are cross-referential and normative, while runtime contexts are expressed in unstructured natural language. This setting motivates us to align semantic information in unstructured text with the structured, normative elements of regulations. To this end, we introduce GraphCompliance, a framework that represents regulatory texts as a Policy Graph and runtime contexts as a Context Graph, and aligns them. In this formulation, the policy graph encodes normative structure and cross-references, whereas the context graph formalizes events as subject-action-object (SAO) and entity-relation triples. This alignment anchors the reasoning of a judge large language model (LLM) in structured information and helps reduce the burden of regulatory interpretation and event parsing, enabling a focus on the core reasoning step. In experiments on 300 GDPR-derived real-world scenarios spanning five evaluation tasks, GraphCompliance yields 4.1-7.2 percentage points (pp) higher micro-F1 than LLM-only and RAG baselines, with fewer under- and over-predictions, resulting in higher recall and lower false positive rates. Ablation studies indicate contributions from each graph component, suggesting that structured representations and a judge LLM are complementary for normative reasoning.
