Table of Contents
Fetching ...

RankPO: Preference Optimization for Job-Talent Matching

Yafei Zhang, Murray Wang, Yu Wang, Xiaohui Wang

TL;DR

The paper tacklesJD-Talent matching in academia where geographic, seniority, and research alignment matter beyond textual similarity. It introduces a two-stage approach: first, contrastive learning on rule-based JD-Talent pairs builds robust embeddings, then RankPO, a ranking-focused extension of Direct Preference Optimization, aligns those embeddings with AI-curated pairwise preferences while preserving prior knowledge from the contrastive phase. A large JD-Talent dataset is constructed from academic postings and researcher profiles, enabling both rule-based and semantic evaluation; RankPO is shown to balance AI-alignment with retention better than standard supervised fine-tuning, reducing catastrophic forgetting. The framework demonstrates practical potential for scalable, context-aware academic recruitment and provides insights into balancing alignment objectives with knowledge retention, supported by publicly available code and data.

Abstract

Matching job descriptions (JDs) with suitable talent requires models capable of understanding not only textual similarities between JDs and candidate resumes but also contextual factors such as geographical location and academic seniority. To address this challenge, we propose a two-stage training framework for large language models (LLMs). In the first stage, a contrastive learning approach is used to train the model on a dataset constructed from real-world matching rules, such as geographical alignment and research area overlap. While effective, this model primarily learns patterns that defined by the matching rules. In the second stage, we introduce a novel preference-based fine-tuning method inspired by Direct Preference Optimization (DPO), termed Rank Preference Optimization (RankPO), to align the model with AI-curated pairwise preferences emphasizing textual understanding. Our experiments show that while the first-stage model achieves strong performance on rule-based data (nDCG@20 = 0.706), it lacks robust textual understanding (alignment with AI annotations = 0.46). By fine-tuning with RankPO, we achieve a balanced model that retains relatively good performance in the original tasks while significantly improving the alignment with AI preferences. The code and data are available at https://github.com/yflyzhang/RankPO.

RankPO: Preference Optimization for Job-Talent Matching

TL;DR

The paper tacklesJD-Talent matching in academia where geographic, seniority, and research alignment matter beyond textual similarity. It introduces a two-stage approach: first, contrastive learning on rule-based JD-Talent pairs builds robust embeddings, then RankPO, a ranking-focused extension of Direct Preference Optimization, aligns those embeddings with AI-curated pairwise preferences while preserving prior knowledge from the contrastive phase. A large JD-Talent dataset is constructed from academic postings and researcher profiles, enabling both rule-based and semantic evaluation; RankPO is shown to balance AI-alignment with retention better than standard supervised fine-tuning, reducing catastrophic forgetting. The framework demonstrates practical potential for scalable, context-aware academic recruitment and provides insights into balancing alignment objectives with knowledge retention, supported by publicly available code and data.

Abstract

Matching job descriptions (JDs) with suitable talent requires models capable of understanding not only textual similarities between JDs and candidate resumes but also contextual factors such as geographical location and academic seniority. To address this challenge, we propose a two-stage training framework for large language models (LLMs). In the first stage, a contrastive learning approach is used to train the model on a dataset constructed from real-world matching rules, such as geographical alignment and research area overlap. While effective, this model primarily learns patterns that defined by the matching rules. In the second stage, we introduce a novel preference-based fine-tuning method inspired by Direct Preference Optimization (DPO), termed Rank Preference Optimization (RankPO), to align the model with AI-curated pairwise preferences emphasizing textual understanding. Our experiments show that while the first-stage model achieves strong performance on rule-based data (nDCG@20 = 0.706), it lacks robust textual understanding (alignment with AI annotations = 0.46). By fine-tuning with RankPO, we achieve a balanced model that retains relatively good performance in the original tasks while significantly improving the alignment with AI preferences. The code and data are available at https://github.com/yflyzhang/RankPO.

Paper Structure

This paper contains 35 sections, 11 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Illustration of the proposed two-stage training framework for job-talent matching. The framework consists of two stages: (1) a contrastive learning stage to acquire foundational capabilities, and (2) a fine-tuning stage using RankPO to improve alignment with AI preferences while striving to retain capabilities learned from the first stage.
  • Figure 2: Comparison of RankPO and SFT across different learning rates in terms of Agreement with AI and nDCG@20. Lines represent the average results over two random seeds. Higher values indicate better adaptation to AI preferences and better retention of previously learned capabilities. It highlights RankPO's superior ability to maintain previously learned capabilities at higher alignment levels compared to SFT.
  • Figure 3: Prompt used for AI annotation.