Table of Contents
Fetching ...

IndiCASA: A Dataset and Bias Evaluation Framework in LLMs Using Contrastive Embedding Similarity in the Indian Context

Santhosh G S, Akshay Govind S, Gokul S Krishnan, Balaraman Ravindran, Sriraam Natarajan

TL;DR

The paper tackles the challenge of bias evaluation for LLMs in the Indian context, where Western-centric benchmarks fail to capture caste, religion, gender, disability, and socioeconomic nuances. It introduces IndiCASA, a 2,575-sentence dataset of stereotype and anti-stereotype pairs, and trains a contrastive encoder to learn context-sensitive embeddings that separate biased from neutral content. A bias evaluation pipeline yields metrics such as a $\Delta sim$ measure of embedding separation and a Bias Score on a $0-100$ scale to assess generation bias, showing persistent disability bias and comparatively lower religion bias across open-weight LLMs. The framework is model-agnostic and supports open-ended bias detection without requiring model logits, offering a scalable tool for culturally aware fairness assessment and debiasing, with future work extending intersectional coverage and domain applications.

Abstract

Large Language Models (LLMs) have gained significant traction across critical domains owing to their impressive contextual understanding and generative capabilities. However, their increasing deployment in high stakes applications necessitates rigorous evaluation of embedded biases, particularly in culturally diverse contexts like India where existing embedding-based bias assessment methods often fall short in capturing nuanced stereotypes. We propose an evaluation framework based on a encoder trained using contrastive learning that captures fine-grained bias through embedding similarity. We also introduce a novel dataset - IndiCASA (IndiBias-based Contextually Aligned Stereotypes and Anti-stereotypes) comprising 2,575 human-validated sentences spanning five demographic axes: caste, gender, religion, disability, and socioeconomic status. Our evaluation of multiple open-weight LLMs reveals that all models exhibit some degree of stereotypical bias, with disability related biases being notably persistent, and religion bias generally lower likely due to global debiasing efforts demonstrating the need for fairer model development.

IndiCASA: A Dataset and Bias Evaluation Framework in LLMs Using Contrastive Embedding Similarity in the Indian Context

TL;DR

The paper tackles the challenge of bias evaluation for LLMs in the Indian context, where Western-centric benchmarks fail to capture caste, religion, gender, disability, and socioeconomic nuances. It introduces IndiCASA, a 2,575-sentence dataset of stereotype and anti-stereotype pairs, and trains a contrastive encoder to learn context-sensitive embeddings that separate biased from neutral content. A bias evaluation pipeline yields metrics such as a measure of embedding separation and a Bias Score on a scale to assess generation bias, showing persistent disability bias and comparatively lower religion bias across open-weight LLMs. The framework is model-agnostic and supports open-ended bias detection without requiring model logits, offering a scalable tool for culturally aware fairness assessment and debiasing, with future work extending intersectional coverage and domain applications.

Abstract

Large Language Models (LLMs) have gained significant traction across critical domains owing to their impressive contextual understanding and generative capabilities. However, their increasing deployment in high stakes applications necessitates rigorous evaluation of embedded biases, particularly in culturally diverse contexts like India where existing embedding-based bias assessment methods often fall short in capturing nuanced stereotypes. We propose an evaluation framework based on a encoder trained using contrastive learning that captures fine-grained bias through embedding similarity. We also introduce a novel dataset - IndiCASA (IndiBias-based Contextually Aligned Stereotypes and Anti-stereotypes) comprising 2,575 human-validated sentences spanning five demographic axes: caste, gender, religion, disability, and socioeconomic status. Our evaluation of multiple open-weight LLMs reveals that all models exhibit some degree of stereotypical bias, with disability related biases being notably persistent, and religion bias generally lower likely due to global debiasing efforts demonstrating the need for fairer model development.

Paper Structure

This paper contains 39 sections, 9 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: The illustration of transformation of the embedding space before and after tuning the encoder on IndiCASA dataset
  • Figure 2: The figure illustrates the overall research workflow comprising three key phases: (1) Dataset Curation, where we construct a context-rich dataset capturing stereotype–anti-stereotype pairs; (2) Encoder Tuning, where a contrastive encoder is trained on the curated IndiCASA dataset to learn meaningful representations of these pairs; and (3) Bias Evaluation, where the trained encoder is used to assess bias in a given LLM by analyzing its outputs against the existing IndiBiassahoo2024indibiasbenchmarkdatasetmeasure sentences.
  • Figure 3: End-to-end workflow for IndiCASA dataset preparation, starting from IndiBias analysis, sentence generation, expert language validation, and final review by social scientists to get the final IndiCASA dataset.
  • Figure 4: Comparison of Validation $\Delta\text{sim}$ for various models across different contrastive loss functions. Higher $\Delta\text{sim}$ values indicate better separation between positive and negative pairs. From left to right: (a) ModernBERT, (b) BERT-base-uncased, and (c) All-Mini-LM-L6-V2. NTXent Loss demonstrates superior performance for ModernBERT and BERT-base-uncased, while NTBXent Loss shows the best results for All-Mini-LM-L6-V2.
  • Figure 5: Two-Component t-SNE plots for embedding vectors of sentences. From left to right: (a) Vanilla Encoder for Caste bias, showing dispersed Stereotype and Anti-Stereotype Embeddings; (b) Finetuned Encoder for Caste bias, demonstrating clear clustering after tuning; (c) Vanilla Encoder for Religion bias, also showing dispersed embeddings; and (d) Finetuned Encoder for Religion bias, exhibiting clear clustering after tuning.
  • ...and 6 more figures