SkillMatch: Evaluating Self-supervised Learning of Skill Relatedness
Jens-Joris Decorte, Jeroen Van Hautte, Thomas Demeester, Chris Develder
TL;DR
SkillMatch introduces a public intrinsic benchmark for skill relatedness derived from expert knowledge in millions of job ads. It proposes a self-supervised adaptation of Sentence-BERT that leverages skill co-occurrence via adjacent skill spans and an InfoNCE loss to learn domain-specific relations. The benchmark is complemented by a comparative study against static vectors (Word2Vec, fastText) and shows domain-specific SBERT achieving the highest AUC-PR and MRR on SkillMatch. This work provides a scalable, transparent resource for evaluating skill representations and motivates future extensions to multi-language coverage and a continuum of relatedness.
Abstract
Accurately modeling the relationships between skills is a crucial part of human resources processes such as recruitment and employee development. Yet, no benchmarks exist to evaluate such methods directly. We construct and release SkillMatch, a benchmark for the task of skill relatedness, based on expert knowledge mining from millions of job ads. Additionally, we propose a scalable self-supervised learning technique to adapt a Sentence-BERT model based on skill co-occurrence in job ads. This new method greatly surpasses traditional models for skill relatedness as measured on SkillMatch. By releasing SkillMatch publicly, we aim to contribute a foundation for research towards increased accuracy and transparency of skill-based recommendation systems.
