ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining
Xiao Yu, Ruize Xu, Chengyuan Xue, Jinzhong Zhang, Xu Ma, Zhou Yu
TL;DR
This work tackles the sparsity of interaction labels in resume–job matching by proposing ConFit v2, which combines a simplified transformer encoder with two novel techniques: Hypothetical Reference Resume Embedding (HyRe) and Runner-Up Mining (RUM). HyRe augments job posts with LLM-generated hypothetical resumes to stabilize representation learning, while RUM mines high-quality hard negatives from the unlabeled space to strengthen contrastive training. Empirical results on two real-world datasets show substantial improvements over ConFit and strong baselines, with average recalls increasing by $13.8\%$ and $nDCG$ by $17.5\%$, and robust gains across encoder backbones. The paper also analyzes errors and biases, discusses limitations and ethical considerations, and provides open-source plans to advance research in dense-res retrieval for person–job fit.
Abstract
A reliable resume-job matching system helps a company recommend suitable candidates from a pool of resumes and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction labels in resume-job datasets are sparse. We introduce ConFit v2, an improvement over ConFit to tackle this sparsity problem. We propose two techniques to enhance the encoder's contrastive training process: augmenting job data with hypothetical reference resume generated by a large language model; and creating high-quality hard negatives from unlabeled resume/job pairs using a novel hard-negative mining strategy. We evaluate ConFit v2 on two real-world datasets and demonstrate that it outperforms ConFit and prior methods (including BM25 and OpenAI text-embedding-003), achieving an average absolute improvement of 13.8% in recall and 17.5% in nDCG across job-ranking and resume-ranking tasks.
