Table of Contents
Fetching ...

Mitigating Language Bias in Cross-Lingual Job Retrieval: A Recruitment Platform Perspective

Napat Laosaengpha, Thanit Tativannarat, Attapol Rutherford, Ekapol Chuangsuwanich

TL;DR

This work tackles cross-lingual job retrieval on recruitment platforms by mitigating language bias in bilingual sentence representations. It introduces a Thai–English multi-task dual-encoder that jointly learns representations for job titles, descriptions, and fields via three tasks, leveraging label-free postings. A novel Language Bias Kullback–Leibler Divergence (LBKL) metric is proposed to quantify bias in retrieval, and the model achieves state-of-the-art cross-lingual performance with a smaller footprint. Empirical results on JTG-Synonym and JTG-Occupation demonstrate significant bias reduction and stronger cross-lingual retrieval, suggesting practical improvements for multilingual recruitment systems and bias-aware evaluation in retrieval models. The approach offers a scalable framework for bias-aware cross-lingual information extraction in domain-specific, low-resource languages.

Abstract

Understanding the textual components of resumes and job postings is critical for improving job-matching accuracy and optimizing job search systems in online recruitment platforms. However, existing works primarily focus on analyzing individual components within this information, requiring multiple specialized tools to analyze each aspect. Such disjointed methods could potentially hinder overall generalizability in recruitment-related text processing. Therefore, we propose a unified sentence encoder that utilized multi-task dual-encoder framework for jointly learning multiple component into the unified sentence encoder. The results show that our method outperforms other state-of-the-art models, despite its smaller model size. Moreover, we propose a novel metric, Language Bias Kullback-Leibler Divergence (LBKL), to evaluate language bias in the encoder, demonstrating significant bias reduction and superior cross-lingual performance.

Mitigating Language Bias in Cross-Lingual Job Retrieval: A Recruitment Platform Perspective

TL;DR

This work tackles cross-lingual job retrieval on recruitment platforms by mitigating language bias in bilingual sentence representations. It introduces a Thai–English multi-task dual-encoder that jointly learns representations for job titles, descriptions, and fields via three tasks, leveraging label-free postings. A novel Language Bias Kullback–Leibler Divergence (LBKL) metric is proposed to quantify bias in retrieval, and the model achieves state-of-the-art cross-lingual performance with a smaller footprint. Empirical results on JTG-Synonym and JTG-Occupation demonstrate significant bias reduction and stronger cross-lingual retrieval, suggesting practical improvements for multilingual recruitment systems and bias-aware evaluation in retrieval models. The approach offers a scalable framework for bias-aware cross-lingual information extraction in domain-specific, low-resource languages.

Abstract

Understanding the textual components of resumes and job postings is critical for improving job-matching accuracy and optimizing job search systems in online recruitment platforms. However, existing works primarily focus on analyzing individual components within this information, requiring multiple specialized tools to analyze each aspect. Such disjointed methods could potentially hinder overall generalizability in recruitment-related text processing. Therefore, we propose a unified sentence encoder that utilized multi-task dual-encoder framework for jointly learning multiple component into the unified sentence encoder. The results show that our method outperforms other state-of-the-art models, despite its smaller model size. Moreover, we propose a novel metric, Language Bias Kullback-Leibler Divergence (LBKL), to evaluate language bias in the encoder, demonstrating significant bias reduction and superior cross-lingual performance.

Paper Structure

This paper contains 25 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The overview of our proposed multi-task dual-encoder framework used to train our sentence encoder. It illustrate the job title translation ranking task on the left, job description-title matching in the middle, and job field classification on the right.
  • Figure 2: A language frequency histogram of LaBSE
  • Figure 3: A language frequency histogram of BGE-M3
  • Figure 4: A language frequency histogram of mUSE
  • Figure 5: A language frequency histogram of mUSE (ours), with the left, middle, and right sections showing histograms for English, Thai, and code-switching queries, respectively. The orange and blue histogram represent the number of candidate results in Thai and English, respectively.