RankPO: Preference Optimization for Job-Talent Matching
Yafei Zhang, Murray Wang, Yu Wang, Xiaohui Wang
TL;DR
The paper tacklesJD-Talent matching in academia where geographic, seniority, and research alignment matter beyond textual similarity. It introduces a two-stage approach: first, contrastive learning on rule-based JD-Talent pairs builds robust embeddings, then RankPO, a ranking-focused extension of Direct Preference Optimization, aligns those embeddings with AI-curated pairwise preferences while preserving prior knowledge from the contrastive phase. A large JD-Talent dataset is constructed from academic postings and researcher profiles, enabling both rule-based and semantic evaluation; RankPO is shown to balance AI-alignment with retention better than standard supervised fine-tuning, reducing catastrophic forgetting. The framework demonstrates practical potential for scalable, context-aware academic recruitment and provides insights into balancing alignment objectives with knowledge retention, supported by publicly available code and data.
Abstract
Matching job descriptions (JDs) with suitable talent requires models capable of understanding not only textual similarities between JDs and candidate resumes but also contextual factors such as geographical location and academic seniority. To address this challenge, we propose a two-stage training framework for large language models (LLMs). In the first stage, a contrastive learning approach is used to train the model on a dataset constructed from real-world matching rules, such as geographical alignment and research area overlap. While effective, this model primarily learns patterns that defined by the matching rules. In the second stage, we introduce a novel preference-based fine-tuning method inspired by Direct Preference Optimization (DPO), termed Rank Preference Optimization (RankPO), to align the model with AI-curated pairwise preferences emphasizing textual understanding. Our experiments show that while the first-stage model achieves strong performance on rule-based data (nDCG@20 = 0.706), it lacks robust textual understanding (alignment with AI annotations = 0.46). By fine-tuning with RankPO, we achieve a balanced model that retains relatively good performance in the original tasks while significantly improving the alignment with AI preferences. The code and data are available at https://github.com/yflyzhang/RankPO.
