Table of Contents
Fetching ...

Judging by the Rules: Compliance-Aligned Framework for Modern Slavery Statement Monitoring

Wenhao Xu, Akshatha Arodi, Jian-Yun Nie, Arsene Fansi Tchango

TL;DR

This paper tackles the challenge of scalable, auditable compliance verification for modern slavery statements by reframing the task as rule-alignment rather than purely factual classification. It introduces a two-stage framework: key-rule extraction to distill regulatory rubrics, and CALLM, a rule-aligned LLM trained with feedback from a domain-specific Compliance Alignment Judge (CA-Judge) using Group Relative Policy Optimization. CALLM produces outputs that are not only accurate but explicitly tethered to statutory rules, improving auditability and human verifiability in high-stakes regulatory contexts, and demonstrating cross-jurisdiction generalization. The approach yields superior task performance, stronger rule-adherence in generated justifications, and favorable human preferences compared with larger models, suggesting practical benefits for real-world compliance review of modern slavery disclosures.

Abstract

Modern slavery affects millions of people worldwide, and regulatory frameworks such as Modern Slavery Acts now require companies to publish detailed disclosures. However, these statements are often vague and inconsistent, making manual review time-consuming and difficult to scale. While NLP offers a promising path forward, high-stakes compliance tasks require more than accurate classification: they demand transparent, rule-aligned outputs that legal experts can verify. Existing applications of large language models (LLMs) often reduce complex regulatory assessments to binary decisions, lacking the necessary structure for robust legal scrutiny. We argue that compliance verification is fundamentally a rule-matching problem: it requires evaluating whether textual statements adhere to well-defined regulatory rules. To this end, we propose a novel framework that harnesses AI for rule-level compliance verification while preserving expert oversight. At its core is the Compliance Alignment Judge (CA-Judge), which evaluates model-generated justifications based on their fidelity to statutory requirements. Using this feedback, we train the Compliance Alignment LLM (CALLM), a model that produces rule-consistent, human-verifiable outputs. CALLM improves predictive performance and generates outputs that are both transparent and legally grounded, offering a more verifiable and actionable solution for real-world compliance analysis.

Judging by the Rules: Compliance-Aligned Framework for Modern Slavery Statement Monitoring

TL;DR

This paper tackles the challenge of scalable, auditable compliance verification for modern slavery statements by reframing the task as rule-alignment rather than purely factual classification. It introduces a two-stage framework: key-rule extraction to distill regulatory rubrics, and CALLM, a rule-aligned LLM trained with feedback from a domain-specific Compliance Alignment Judge (CA-Judge) using Group Relative Policy Optimization. CALLM produces outputs that are not only accurate but explicitly tethered to statutory rules, improving auditability and human verifiability in high-stakes regulatory contexts, and demonstrating cross-jurisdiction generalization. The approach yields superior task performance, stronger rule-adherence in generated justifications, and favorable human preferences compared with larger models, suggesting practical benefits for real-world compliance review of modern slavery disclosures.

Abstract

Modern slavery affects millions of people worldwide, and regulatory frameworks such as Modern Slavery Acts now require companies to publish detailed disclosures. However, these statements are often vague and inconsistent, making manual review time-consuming and difficult to scale. While NLP offers a promising path forward, high-stakes compliance tasks require more than accurate classification: they demand transparent, rule-aligned outputs that legal experts can verify. Existing applications of large language models (LLMs) often reduce complex regulatory assessments to binary decisions, lacking the necessary structure for robust legal scrutiny. We argue that compliance verification is fundamentally a rule-matching problem: it requires evaluating whether textual statements adhere to well-defined regulatory rules. To this end, we propose a novel framework that harnesses AI for rule-level compliance verification while preserving expert oversight. At its core is the Compliance Alignment Judge (CA-Judge), which evaluates model-generated justifications based on their fidelity to statutory requirements. Using this feedback, we train the Compliance Alignment LLM (CALLM), a model that produces rule-consistent, human-verifiable outputs. CALLM improves predictive performance and generates outputs that are both transparent and legally grounded, offering a more verifiable and actionable solution for real-world compliance analysis.

Paper Structure

This paper contains 47 sections, 6 equations, 14 figures, 7 tables, 1 algorithm.

Figures (14)

  • Figure 1: Overview of our framework, which consists of two steps: (1) Key Rule Extraction derives natural language rubrics for a compliance criterion. (2) Alignment trains CALLM to generate rule-aligned outputs using feedback from CA-Judge. The figure shows an example for the C4 Remediation criterion under the Australian Modern Slavery Act, including the corresponding key rules, a target sentence to be classified as compliant (YES) or non-compliant (NO), and CALLM’s rule-aligned generation.
  • Figure 2: Evaluation dimensions used by the CA-Judge. The key rules for the C4 Remediation criterion under the Australian Modern Slavery Act are also shown, along with examples of relevant and irrelevant sentence types.
  • Figure 3: The Compliance Alignment Judge evaluates generated completion from a model against predefined key rules for compliance and generates a decision score with justification. The score reflects the degree of rule compliance and quality, enabling fine-grained, rule-aligned evaluation.
  • Figure 4: Illustrative use cases of CALLM with CA-Judge scoring. Sentence includes the target sentence (bold) and context. Top: CALLM correctly predicts compliance for C4 Remediation. CA-Judge assigns a high score, reflecting strong alignment with key rules (shown in Figure \ref{['fig:evaluation_dimension']}) and well-structured reasoning. Bottom: CALLM incorrectly predicts compliance for Approval. CA-Judge detects flaws in the reasoning, correctly lowering the score in line with the rule violations (rules in Figure \ref{['fig:intro-figure']}).
  • Figure 5: Human‐preference comparison between our model and the baseline.
  • ...and 9 more figures