Table of Contents
Fetching ...

Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking

Xi Chen, Chuan Qin, Chuyu Fang, Chao Wang, Chen Zhu, Fuzhen Zhuang, Hengshu Zhu, Hui Xiong

TL;DR

The paper tackles the challenge of forecasting evolving skill demand in a dynamic job market by introducing Job-SDF, a large public dataset that provides monthly time series of 2,324 skills across 52 occupations, 521 companies, and 7 regions. Skill requirements are mined from public job ads via NER, enabling multi-granularity forecasting tasks and evaluation with metrics such as MAE, RMSE, SMAPE, and RRMSE. A broad benchmark of models—from ARIMA and Prophet to RNNs, Transformers, MLPs, GNNs, and Fourier-based methods—is conducted to assess performance across granularities and under structural breaks; PatchTST and FiLM emerge as robust performers, while low-frequency skills and breaks present persistent challenges. The dataset and accompanying code promise reproducible research and have practical implications for workforce planning, training program alignment, and regional policy development in a rapidly shifting labor market.

Abstract

In a rapidly evolving job market, skill demand forecasting is crucial as it enables policymakers and businesses to anticipate and adapt to changes, ensuring that workforce skills align with market needs, thereby enhancing productivity and competitiveness. Additionally, by identifying emerging skill requirements, it directs individuals towards relevant training and education opportunities, promoting continuous self-learning and development. However, the absence of comprehensive datasets presents a significant challenge, impeding research and the advancement of this field. To bridge this gap, we present Job-SDF, a dataset designed to train and benchmark job-skill demand forecasting models. Based on 10.35 million public job advertisements collected from major online recruitment platforms in China between 2021 and 2023, this dataset encompasses monthly recruitment demand for 2,324 types of skills across 521 companies. Our dataset uniquely enables evaluating skill demand forecasting models at various granularities, including occupation, company, and regional levels. We benchmark a range of models on this dataset, evaluating their performance in standard scenarios, in predictions focused on lower value ranges, and in the presence of structural breaks, providing new insights for further research. Our code and dataset are publicly accessible via the https://github.com/Job-SDF/benchmark.

Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking

TL;DR

The paper tackles the challenge of forecasting evolving skill demand in a dynamic job market by introducing Job-SDF, a large public dataset that provides monthly time series of 2,324 skills across 52 occupations, 521 companies, and 7 regions. Skill requirements are mined from public job ads via NER, enabling multi-granularity forecasting tasks and evaluation with metrics such as MAE, RMSE, SMAPE, and RRMSE. A broad benchmark of models—from ARIMA and Prophet to RNNs, Transformers, MLPs, GNNs, and Fourier-based methods—is conducted to assess performance across granularities and under structural breaks; PatchTST and FiLM emerge as robust performers, while low-frequency skills and breaks present persistent challenges. The dataset and accompanying code promise reproducible research and have practical implications for workforce planning, training program alignment, and regional policy development in a rapidly shifting labor market.

Abstract

In a rapidly evolving job market, skill demand forecasting is crucial as it enables policymakers and businesses to anticipate and adapt to changes, ensuring that workforce skills align with market needs, thereby enhancing productivity and competitiveness. Additionally, by identifying emerging skill requirements, it directs individuals towards relevant training and education opportunities, promoting continuous self-learning and development. However, the absence of comprehensive datasets presents a significant challenge, impeding research and the advancement of this field. To bridge this gap, we present Job-SDF, a dataset designed to train and benchmark job-skill demand forecasting models. Based on 10.35 million public job advertisements collected from major online recruitment platforms in China between 2021 and 2023, this dataset encompasses monthly recruitment demand for 2,324 types of skills across 521 companies. Our dataset uniquely enables evaluating skill demand forecasting models at various granularities, including occupation, company, and regional levels. We benchmark a range of models on this dataset, evaluating their performance in standard scenarios, in predictions focused on lower value ranges, and in the presence of structural breaks, providing new insights for further research. Our code and dataset are publicly accessible via the https://github.com/Job-SDF/benchmark.
Paper Structure (25 sections, 6 equations, 2 figures, 12 tables)

This paper contains 25 sections, 6 equations, 2 figures, 12 tables.

Figures (2)

  • Figure 1: Data analysis on Job-SDF. (a) illustrates the long-tail phenomenon of skill demands under the product manager and doctor occupations. (b) illustrates the results under the Chow test for the absence (left) and presence (right) of structural breaks.
  • Figure 2: Pearson Correlation Coefficients.

Theorems & Definitions (1)

  • Definition 1: Job Skill Demand Forecasting