Table of Contents
Fetching ...

KARRIEREWEGE: A Large Scale Career Path Prediction Dataset

Elena Senger, Yuri Campbell, Rob van der Goot, Barbara Plank

TL;DR

This work introduces Karrierewege, a large-scale, ESCO-mapped career path dataset with over 500k records, and Karrierewege+ which synthesizes paraphrased free-text job titles and descriptions to handle resumes. By mapping German resume data to ESCO and applying filtering, the authors create a standardized resource and demonstrate improved robustness of state-of-the-art models on free-text data. Through extensive benchmarking against the DECORTE baseline, they show that training on Karrierewege—especially the K+cp variant—yields superior performance in predicting subsequent occupations and generalizes across datasets. The dataset enables multilingual, interoperable career trajectory modeling while highlighting ethical considerations and the need for bias-aware deployment in HR contexts.

Abstract

Accurate career path prediction can support many stakeholders, like job seekers, recruiters, HR, and project managers. However, publicly available data and tools for career path prediction are scarce. In this work, we introduce KARRIEREWEGE, a comprehensive, publicly available dataset containing over 500k career paths, significantly surpassing the size of previously available datasets. We link the dataset to the ESCO taxonomy to offer a valuable resource for predicting career trajectories. To tackle the problem of free-text inputs typically found in resumes, we enhance it by synthesizing job titles and descriptions resulting in KARRIEREWEGE+. This allows for accurate predictions from unstructured data, closely aligning with real-world application challenges. We benchmark existing state-of-the-art (SOTA) models on our dataset and a prior benchmark and observe improved performance and robustness, particularly for free-text use cases, due to the synthesized data.

KARRIEREWEGE: A Large Scale Career Path Prediction Dataset

TL;DR

This work introduces Karrierewege, a large-scale, ESCO-mapped career path dataset with over 500k records, and Karrierewege+ which synthesizes paraphrased free-text job titles and descriptions to handle resumes. By mapping German resume data to ESCO and applying filtering, the authors create a standardized resource and demonstrate improved robustness of state-of-the-art models on free-text data. Through extensive benchmarking against the DECORTE baseline, they show that training on Karrierewege—especially the K+cp variant—yields superior performance in predicting subsequent occupations and generalizes across datasets. The dataset enables multilingual, interoperable career trajectory modeling while highlighting ethical considerations and the need for bias-aware deployment in HR contexts.

Abstract

Accurate career path prediction can support many stakeholders, like job seekers, recruiters, HR, and project managers. However, publicly available data and tools for career path prediction are scarce. In this work, we introduce KARRIEREWEGE, a comprehensive, publicly available dataset containing over 500k career paths, significantly surpassing the size of previously available datasets. We link the dataset to the ESCO taxonomy to offer a valuable resource for predicting career trajectories. To tackle the problem of free-text inputs typically found in resumes, we enhance it by synthesizing job titles and descriptions resulting in KARRIEREWEGE+. This allows for accurate predictions from unstructured data, closely aligning with real-world application challenges. We benchmark existing state-of-the-art (SOTA) models on our dataset and a prior benchmark and observe improved performance and robustness, particularly for free-text use cases, due to the synthesized data.

Paper Structure

This paper contains 33 sections, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Steps necessary to create Karrierewege and Karrierewege+ datasets. Titles and descriptions with the _oc suffix are synthesized per occupation, while those with _cp are synthesized per career path.
  • Figure 2: Work experiences per resume for the Karrierewege and DECORTE dataset.
  • Figure 3: Tree maps on ESCO codes with one digit.
  • Figure 4: Rank plot of normalized frequencies of ESCO codes with full digits.
  • Figure 5: Length of generated job titles and job descriptions with both strategies K+oc and K+cp in comparison with ESCO.