Table of Contents
Fetching ...

Gender Representation and Bias in Indian Civil Service Mock Interviews

Somonnoy Banerjee, Sujan Dutta, Soumyajit Datta, Ashiqur R. KhudaBukhsh

TL;DR

The study analyzes gender representation and bias in UPSC mock interviews by constructing a large dataset of 51,278 questions from 888 videos. It demonstrates gender-biased questioning patterns, a predominantly male interviewer panel, and societal biases reflected in LLM explanations for gender inference. The work introduces a public dataset and a robust methodological pipeline for auditing bias in conversational content, with implications for fairness in high-stakes selection processes. The findings underscore the need for bias-aware reforms and systematic AI-explanation audits in educational and public-sector contexts.

Abstract

This paper makes three key contributions. First, via a substantial corpus of 51,278 interview questions sourced from 888 YouTube videos of mock interviews of Indian civil service candidates, we demonstrate stark gender bias in the broad nature of questions asked to male and female candidates. Second, our experiments with large language models show a strong presence of gender bias in explanations provided by the LLMs on the gender inference task. Finally, we present a novel dataset of 51,278 interview questions that can inform future social science studies.

Gender Representation and Bias in Indian Civil Service Mock Interviews

TL;DR

The study analyzes gender representation and bias in UPSC mock interviews by constructing a large dataset of 51,278 questions from 888 videos. It demonstrates gender-biased questioning patterns, a predominantly male interviewer panel, and societal biases reflected in LLM explanations for gender inference. The work introduces a public dataset and a robust methodological pipeline for auditing bias in conversational content, with implications for fairness in high-stakes selection processes. The findings underscore the need for bias-aware reforms and systematic AI-explanation audits in educational and public-sector contexts.

Abstract

This paper makes three key contributions. First, via a substantial corpus of 51,278 interview questions sourced from 888 YouTube videos of mock interviews of Indian civil service candidates, we demonstrate stark gender bias in the broad nature of questions asked to male and female candidates. Second, our experiments with large language models show a strong presence of gender bias in explanations provided by the LLMs on the gender inference task. Finally, we present a novel dataset of 51,278 interview questions that can inform future social science studies.
Paper Structure (26 sections, 4 equations, 5 figures, 9 tables)

This paper contains 26 sections, 4 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Distribution of records based on rank and gender. Rank information is obtained from the video title.
  • Figure 2: t-SNE van2008visualizing visualization of top eight question topics. For better visualization, 1000 questions were randomly sampled from each cluster. Topic explanations -- 2: history and mythology, 6: agriculture and environment, 8: science, 9: foreign policy, 11: economics, 14: gender related, 15: law and order, 16: engineering and technology. Relevant keywords are listed in Table \ref{['table:clusters']}.
  • Figure 3: Prompt designed to infer gender using LLMs.
  • Figure 4: Wordclouds highlighting the top words found from the differential analysis of unigram distribution of LLM explanations. Top images illustrates words like engineering, technical, civil, knowledge while the bottom images feature words like empathy, gender, social, issue, awareness indicating the ingrained bias in the reasoning process of LLMs.
  • Figure 5: Distribution of mock interview videos across channels