Table of Contents
Fetching ...

Rethinking Legal Compliance Automation: Opportunities with Large Language Models

Shabnam Hassani, Mehrdad Sabetzadeh, Daniel Amyot, Jain Liao

TL;DR

The paper tackles the challenge of automating legal compliance analysis for software-intensive systems under GDPR. It critiques prevailing sentence-level analysis and rule-based approaches for lacking justification and scalability, and proposes an end-to-end framework that leverages large language models (LLMs) to analyze broader textual context (paragraphs) and to generate explanations for compliance decisions. The authors describe a three-step approach—content chunking, prompt construction, and LLM-driven compliance checking—and provide implementation details with multiple LLMs, releasing RE2024Replication for reproducibility. Preliminary results on DPAs indicate substantial accuracy gains when using paragraph-level context (up to around 40% improvements) and demonstrate the importance of LLM choice on performance, alongside feasible runtime costs. The work lays groundwork for more scalable, explainable, and adaptable compliance automation, with future directions including richer paragraph-level annotations, broader legal-domain validation, and expert evaluation of generated rationales.

Abstract

As software-intensive systems face growing pressure to comply with laws and regulations, providing automated support for compliance analysis has become paramount. Despite advances in the Requirements Engineering (RE) community on legal compliance analysis, important obstacles remain in developing accurate and generalizable compliance automation solutions. This paper highlights some observed limitations of current approaches and examines how adopting new automation strategies that leverage Large Language Models (LLMs) can help address these shortcomings and open up fresh opportunities. Specifically, we argue that the examination of (textual) legal artifacts should, first, employ a broader context than sentences, which have widely been used as the units of analysis in past research. Second, the mode of analysis with legal artifacts needs to shift from classification and information extraction to more end-to-end strategies that are not only accurate but also capable of providing explanation and justification. We present a compliance analysis approach designed to address these limitations. We further outline our evaluation plan for the approach and provide preliminary evaluation results based on data processing agreements (DPAs) that must comply with the General Data Protection Regulation (GDPR). Our initial findings suggest that our approach yields substantial accuracy improvements and, at the same time, provides justification for compliance decisions.

Rethinking Legal Compliance Automation: Opportunities with Large Language Models

TL;DR

The paper tackles the challenge of automating legal compliance analysis for software-intensive systems under GDPR. It critiques prevailing sentence-level analysis and rule-based approaches for lacking justification and scalability, and proposes an end-to-end framework that leverages large language models (LLMs) to analyze broader textual context (paragraphs) and to generate explanations for compliance decisions. The authors describe a three-step approach—content chunking, prompt construction, and LLM-driven compliance checking—and provide implementation details with multiple LLMs, releasing RE2024Replication for reproducibility. Preliminary results on DPAs indicate substantial accuracy gains when using paragraph-level context (up to around 40% improvements) and demonstrate the importance of LLM choice on performance, alongside feasible runtime costs. The work lays groundwork for more scalable, explainable, and adaptable compliance automation, with future directions including richer paragraph-level annotations, broader legal-domain validation, and expert evaluation of generated rationales.

Abstract

As software-intensive systems face growing pressure to comply with laws and regulations, providing automated support for compliance analysis has become paramount. Despite advances in the Requirements Engineering (RE) community on legal compliance analysis, important obstacles remain in developing accurate and generalizable compliance automation solutions. This paper highlights some observed limitations of current approaches and examines how adopting new automation strategies that leverage Large Language Models (LLMs) can help address these shortcomings and open up fresh opportunities. Specifically, we argue that the examination of (textual) legal artifacts should, first, employ a broader context than sentences, which have widely been used as the units of analysis in past research. Second, the mode of analysis with legal artifacts needs to shift from classification and information extraction to more end-to-end strategies that are not only accurate but also capable of providing explanation and justification. We present a compliance analysis approach designed to address these limitations. We further outline our evaluation plan for the approach and provide preliminary evaluation results based on data processing agreements (DPAs) that must comply with the General Data Protection Regulation (GDPR). Our initial findings suggest that our approach yields substantial accuracy improvements and, at the same time, provides justification for compliance decisions.
Paper Structure (22 sections, 3 figures, 1 table)

This paper contains 22 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Approach overview.
  • Figure 2: (a) Illustrative data processing agreement (DPA), (b) Prompt including three parts: Passages from Regulatory Artifact (sentence-level (➊) or paragraph-level (➋) input), Prompt Template, and Compliance Rules.
  • Figure 3: Illustrative Compliance Report generated by GPT-4 Turbo: for sentence-level inputs (➊) without context, and for paragraph-level inputs (➋) with context.