Gender Representation and Bias in Indian Civil Service Mock Interviews

Somonnoy Banerjee; Sujan Dutta; Soumyajit Datta; Ashiqur R. KhudaBukhsh

Gender Representation and Bias in Indian Civil Service Mock Interviews

Somonnoy Banerjee, Sujan Dutta, Soumyajit Datta, Ashiqur R. KhudaBukhsh

TL;DR

The study analyzes gender representation and bias in UPSC mock interviews by constructing a large dataset of 51,278 questions from 888 videos. It demonstrates gender-biased questioning patterns, a predominantly male interviewer panel, and societal biases reflected in LLM explanations for gender inference. The work introduces a public dataset and a robust methodological pipeline for auditing bias in conversational content, with implications for fairness in high-stakes selection processes. The findings underscore the need for bias-aware reforms and systematic AI-explanation audits in educational and public-sector contexts.

Abstract

This paper makes three key contributions. First, via a substantial corpus of 51,278 interview questions sourced from 888 YouTube videos of mock interviews of Indian civil service candidates, we demonstrate stark gender bias in the broad nature of questions asked to male and female candidates. Second, our experiments with large language models show a strong presence of gender bias in explanations provided by the LLMs on the gender inference task. Finally, we present a novel dataset of 51,278 interview questions that can inform future social science studies.

Gender Representation and Bias in Indian Civil Service Mock Interviews

TL;DR

Abstract

Paper Structure (26 sections, 4 equations, 5 figures, 9 tables)

This paper contains 26 sections, 4 equations, 5 figures, 9 tables.

Introduction
Dataset
Step 1: Identifying Relevant Videos
Step 2: Obtaining Interview Transcripts
Step 3: Gender Inference of Interview Candidates
Gender Inference from Names Only.
Step 4: Sets of Interview Questions
Related Work
Results and Discussion
Representation
Bias in Discourse and Questions
Unigram Differential Analysis.
Word Embedding Association Tests.
Semantic Clustering of Questions by Topic.
Separability Tests
...and 11 more sections

Figures (5)

Figure 1: Distribution of records based on rank and gender. Rank information is obtained from the video title.
Figure 2: t-SNE van2008visualizing visualization of top eight question topics. For better visualization, 1000 questions were randomly sampled from each cluster. Topic explanations -- 2: history and mythology, 6: agriculture and environment, 8: science, 9: foreign policy, 11: economics, 14: gender related, 15: law and order, 16: engineering and technology. Relevant keywords are listed in Table \ref{['table:clusters']}.
Figure 3: Prompt designed to infer gender using LLMs.
Figure 4: Wordclouds highlighting the top words found from the differential analysis of unigram distribution of LLM explanations. Top images illustrates words like engineering, technical, civil, knowledge while the bottom images feature words like empathy, gender, social, issue, awareness indicating the ingrained bias in the reasoning process of LLMs.
Figure 5: Distribution of mock interview videos across channels

Gender Representation and Bias in Indian Civil Service Mock Interviews

TL;DR

Abstract

Gender Representation and Bias in Indian Civil Service Mock Interviews

Authors

TL;DR

Abstract

Table of Contents

Figures (5)