Table of Contents
Fetching ...

CatRAG: Functor-Guided Structural Debiasing with Retrieval Augmentation for Fair LLMs

Ravi Ranjan, Utkarsh Grover, Mayur Akewar, Xiaomin Lin, Agoritsa Polyzou

Abstract

Large Language Models (LLMs) are deployed in high-stakes settings but can show demographic, gender, and geographic biases that undermine fairness and trust. Prior debiasing methods, including embedding-space projections, prompt-based steering, and causal interventions, often act at a single stage of the pipeline, resulting in incomplete mitigation and brittle utility trade-offs under distribution shifts. We propose CatRAG Debiasing, a dual-pronged framework that integrates functor with Retrieval-Augmented Generation (RAG) guided structural debiasing. The functor component leverages category-theoretic structure to induce a principled, structure-preserving projection that suppresses bias-associated directions in the embedding space while retaining task-relevant semantics. On the Bias Benchmark for Question Answering (BBQ) across three open-source LLMs (Meta Llama-3, OpenAI GPT-OSS, and Google Gemma-3), CatRAG achieves state-of-the-art results, improving accuracy by up to 40% over the corresponding base models and by more than 10% over prior debiasing methods, while reducing bias scores to near zero (from 60% for the base models) across gender, nationality, race, and intersectional subgroups.

CatRAG: Functor-Guided Structural Debiasing with Retrieval Augmentation for Fair LLMs

Abstract

Large Language Models (LLMs) are deployed in high-stakes settings but can show demographic, gender, and geographic biases that undermine fairness and trust. Prior debiasing methods, including embedding-space projections, prompt-based steering, and causal interventions, often act at a single stage of the pipeline, resulting in incomplete mitigation and brittle utility trade-offs under distribution shifts. We propose CatRAG Debiasing, a dual-pronged framework that integrates functor with Retrieval-Augmented Generation (RAG) guided structural debiasing. The functor component leverages category-theoretic structure to induce a principled, structure-preserving projection that suppresses bias-associated directions in the embedding space while retaining task-relevant semantics. On the Bias Benchmark for Question Answering (BBQ) across three open-source LLMs (Meta Llama-3, OpenAI GPT-OSS, and Google Gemma-3), CatRAG achieves state-of-the-art results, improving accuracy by up to 40% over the corresponding base models and by more than 10% over prior debiasing methods, while reducing bias scores to near zero (from 60% for the base models) across gender, nationality, race, and intersectional subgroups.
Paper Structure (30 sections, 26 equations, 4 figures, 3 tables)

This paper contains 30 sections, 26 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Research challenge: the same job-advice query yields systematically different recommendations when the context implies developed vs. developing countries, reflecting stereotyped associations rather than qualification-based reasoning. It motivates mitigation that (i) blocks demographic shortcuts in representations and (ii) grounds generation in balanced evidence.
  • Figure 2: Overview of the proposed pipeline. The input query is processed along two paths: (1) Functor-guided structural debiasing maps the biased embedding space to an unbiased one via a debiased projection, reducing demographic separability while preserving task-relevant structure; (2) Retrieval augmentation selects a small set of diverse, counter-stereotypical evidence passages from an external corpus. A context fusion module injects retrieved evidence into the prompt, and the LLM generates using the projected embedding layer to produce a grounded, fair output.
  • Figure 3: Gender subset scatter plots: x-axis is the confidence score for the male-coded option and y-axis for the female-coded option. Red points are the base model; colored points show the post-mitigation distribution for each method.
  • Figure 4: Accuracy vs. Bias Score for Functor-only, RAG-only, and full pipeline. Better performance lies toward the upper-left (higher accuracy, lower bias score).