Table of Contents
Fetching ...

Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring

Buse Sibel Korkmaz, Rahul Nair, Elizabeth M. Daly, Evangelos Anagnostopoulos, Christos Varytimidis, Antonio del Rio Chanona

TL;DR

This work introduces AutoRefine, an offline reinforcement-learning-based framework that fine-tunes foundation models for task-focused outputs without human feedback, targeting fairness in algorithmic hiring. By coupling a fine-tuned base model with a perturbation regulator and a task-specific evaluator, AutoRefine optimizes outputs through a three-stage process (align, perturb, deploy) using ILQL to maximize diversity-oriented rewards. Applied to job-description rewriting, the approach reduces bias in candidate matching while preserving or improving recommendation quality, as demonstrated on Hacker News, Bias in Bios, and a live hiring platform dataset. The method offers a scalable pathway for governance-aligned language generation, enabling measurable improvements in fairness metrics with real-world hiring impact, albeit with computational costs and careful attention to evaluation proxies and potential hallucinations.

Abstract

Foundation models require fine-tuning to ensure their generative outputs align with intended results for specific tasks. Automating this fine-tuning process is challenging, as it typically needs human feedback that can be expensive to acquire. We present AutoRefine, a method that leverages reinforcement learning for targeted fine-tuning, utilizing direct feedback from measurable performance improvements in specific downstream tasks. We demonstrate the method for a problem arising in algorithmic hiring platforms where linguistic biases influence a recommendation system. In this setting, a generative model seeks to rewrite given job specifications to receive more diverse candidate matches from a recommendation engine which matches jobs to candidates. Our model detects and regulates biases in job descriptions to meet diversity and fairness criteria. The experiments on a public hiring dataset and a real-world hiring platform showcase how large language models can assist in identifying and mitigation biases in the real world.

Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring

TL;DR

This work introduces AutoRefine, an offline reinforcement-learning-based framework that fine-tunes foundation models for task-focused outputs without human feedback, targeting fairness in algorithmic hiring. By coupling a fine-tuned base model with a perturbation regulator and a task-specific evaluator, AutoRefine optimizes outputs through a three-stage process (align, perturb, deploy) using ILQL to maximize diversity-oriented rewards. Applied to job-description rewriting, the approach reduces bias in candidate matching while preserving or improving recommendation quality, as demonstrated on Hacker News, Bias in Bios, and a live hiring platform dataset. The method offers a scalable pathway for governance-aligned language generation, enabling measurable improvements in fairness metrics with real-world hiring impact, albeit with computational costs and careful attention to evaluation proxies and potential hallucinations.

Abstract

Foundation models require fine-tuning to ensure their generative outputs align with intended results for specific tasks. Automating this fine-tuning process is challenging, as it typically needs human feedback that can be expensive to acquire. We present AutoRefine, a method that leverages reinforcement learning for targeted fine-tuning, utilizing direct feedback from measurable performance improvements in specific downstream tasks. We demonstrate the method for a problem arising in algorithmic hiring platforms where linguistic biases influence a recommendation system. In this setting, a generative model seeks to rewrite given job specifications to receive more diverse candidate matches from a recommendation engine which matches jobs to candidates. Our model detects and regulates biases in job descriptions to meet diversity and fairness criteria. The experiments on a public hiring dataset and a real-world hiring platform showcase how large language models can assist in identifying and mitigation biases in the real world.
Paper Structure (44 sections, 11 equations, 2 figures, 14 tables)

This paper contains 44 sections, 11 equations, 2 figures, 14 tables.

Figures (2)

  • Figure 1: Our methodology AutoRefine works by building a perturbation model that assesses the alignment of the generated content with task-specific goals. Evaluations serve as computational feedback that iteratively updates the perturbation model. During generation, both the original and perturbation models are used to generate tokens.
  • Figure 2: Differential impact on selection probabilities across job titles by gender. This figure visualizes the changes in selection probabilities for various job titles when gender identification is incorporated into candidate profiles. The depicted titles are those experiencing the most pronounced shifts in probabilities. Negative values indicate a reduction in selection probability.