Table of Contents
Fetching ...

Transparent NLP: Using RAG and LLM Alignment for Privacy Q&A

Anna Leschanowsky, Zahra Kolagar, Erion Çano, Ivan Habernal, Dara Hallinan, Emanuël A. P. Habets, Birgit Popp

TL;DR

The paper tackles GDPR transparency challenges in NLP by evaluating Retrieval Augmented Generation (RAG) systems augmented with alignment modules, specifically Rewindable Auto-regressive Inference (RAIN) and its multidimensional extension MultiRAIN, using a Privacy Q&A dataset (expert_privacy_qa). It introduces a rigorous experimental framework with nine systems across three experiments and 21 evaluation metrics, including LLM-as-a-judge and deterministic measures, plus PCA analysis to explore metric relationships. Results show that alignment-enabled systems generally outperform vanilla RAG on most metrics, though none reach human-level precision across all criteria, and PCA exposes complex, sometimes conflicting, metric relationships and gaps in current measurement approaches. The work provides a foundation for integrating deep NLP systems into GDPR compliance workflows and outlines concrete directions for improving alignment methods, metric design, and the legal analysis that underpins automated transparency claims.

Abstract

The transparency principle of the General Data Protection Regulation (GDPR) requires data processing information to be clear, precise, and accessible. While language models show promise in this context, their probabilistic nature complicates truthfulness and comprehensibility. This paper examines state-of-the-art Retrieval Augmented Generation (RAG) systems enhanced with alignment techniques to fulfill GDPR obligations. We evaluate RAG systems incorporating an alignment module like Rewindable Auto-regressive Inference (RAIN) and our proposed multidimensional extension, MultiRAIN, using a Privacy Q&A dataset. Responses are optimized for preciseness and comprehensibility and are assessed through 21 metrics, including deterministic and large language model-based evaluations. Our results show that RAG systems with an alignment module outperform baseline RAG systems on most metrics, though none fully match human answers. Principal component analysis of the results reveals complex interactions between metrics, highlighting the need to refine metrics. This study provides a foundation for integrating advanced natural language processing systems into legal compliance frameworks.

Transparent NLP: Using RAG and LLM Alignment for Privacy Q&A

TL;DR

The paper tackles GDPR transparency challenges in NLP by evaluating Retrieval Augmented Generation (RAG) systems augmented with alignment modules, specifically Rewindable Auto-regressive Inference (RAIN) and its multidimensional extension MultiRAIN, using a Privacy Q&A dataset (expert_privacy_qa). It introduces a rigorous experimental framework with nine systems across three experiments and 21 evaluation metrics, including LLM-as-a-judge and deterministic measures, plus PCA analysis to explore metric relationships. Results show that alignment-enabled systems generally outperform vanilla RAG on most metrics, though none reach human-level precision across all criteria, and PCA exposes complex, sometimes conflicting, metric relationships and gaps in current measurement approaches. The work provides a foundation for integrating deep NLP systems into GDPR compliance workflows and outlines concrete directions for improving alignment methods, metric design, and the legal analysis that underpins automated transparency claims.

Abstract

The transparency principle of the General Data Protection Regulation (GDPR) requires data processing information to be clear, precise, and accessible. While language models show promise in this context, their probabilistic nature complicates truthfulness and comprehensibility. This paper examines state-of-the-art Retrieval Augmented Generation (RAG) systems enhanced with alignment techniques to fulfill GDPR obligations. We evaluate RAG systems incorporating an alignment module like Rewindable Auto-regressive Inference (RAIN) and our proposed multidimensional extension, MultiRAIN, using a Privacy Q&A dataset. Responses are optimized for preciseness and comprehensibility and are assessed through 21 metrics, including deterministic and large language model-based evaluations. Our results show that RAG systems with an alignment module outperform baseline RAG systems on most metrics, though none fully match human answers. Principal component analysis of the results reveals complex interactions between metrics, highlighting the need to refine metrics. This study provides a foundation for integrating advanced natural language processing systems into legal compliance frameworks.

Paper Structure

This paper contains 30 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Evaluation metrics presented as subplots: (a) LLM-as-Judge Metrics, (b) Statistical Metrics for Correctness with Excerpt-Baseline, (c) Statistical Metrics for Correctness with Designed-Answers-1-Baseline (DA1), (d) Statistical Metrics for Correctness with Designed-Answers-2-Baseline (DA2), (e) Statistical Metrics for Readability.
  • Figure 2: These 2D PCA projections show relationships between text evaluation metrics. Subplot (a) categorizes metrics as preciseness (dark green) or comprehensibility (yellow). Subplot (b) maps metric names to positions. Subplot (c) distinguishes metrics needing a gold standard (light grey) from those that do not (dark grey) and shows computational costs by dot size.
  • Figure 3: Screenshots of a part of a conversation with ChatGPT 4o on December 5th 2024 on data processing. The conversation started with the user asking What happens with my data in ChatGPT? Through repeated inquiries, ChatGPT was shown to present inaccurate information about data processing, which ChatGPT admitted to.
  • Figure 4: Explained variance by principal components. The first and second principal components are projected as x- and y- axis, respectively, in Figure \ref{['fig:pca_projections']}.