Table of Contents
Fetching ...

Patterns of co-occurrent skills in UK job adverts

Zhaolu Liu, Jonathan M. Clarke, Bertha Rohenkohl, Mauricio Barahona

TL;DR

The paper builds a data-driven, multiscale portrait of UK skill demand by constructing a co-occurrence network from 65 million Adzuna job adverts and applying MCA-based embedding, Continuous kNN sparsification, and Markov Stability to detect robust skill clusters at multiple resolutions (MS21 and MS7). By analyzing centrality, containment, semantic similarity, salaries, and regional distributions, the study reveals that skill clusters vary in their network roles and wage premia, with broader cross-cluster co-occurrence increasing from 2016 to 2022. Comparison with Lightcast expert categories shows partial agreement, highlighting that data-driven clusters capture relationships not fully captured by expert taxonomy. The work underscores regional industrial differences and a broadening of skill requirements over time, offering a foundation for workforce planning and policy with data-driven cluster labels and interpretable metrics.

Abstract

A job usually involves the application of several complementary or synergistic skills to perform its required tasks. Such relationships are implicitly recognised by employers in the skills they demand when recruiting new employees. Here we construct a skills network based on their co-occurrence in a national level data set of 65 million job postings from the UK spanning 2016 to 2022. We then apply multiscale graph-based community detection to obtain data-driven skill clusters at different levels of resolution that reveal a modular structure across scales. Skill clusters display diverse levels of demand and occupy varying roles within the skills network: some have broad reach across the network (high closeness centrality) while others have higher levels of within-cluster containment, yet with high interconnection across clusters and no skill silos. The skill clusters also display varying levels of semantic similarity, highlighting the difference between co-occurrence in adverts and intrinsic thematic consistency. Clear geographic variation is evident in the demand for each skill cluster across the UK, broadly reflecting the industrial characteristics of each region, e.g., London appears as an outlier as an international hub for finance, education and business. Comparison of data from 2016 and 2022 reveals employers are demanding a broader range of skills over time, with more adverts featuring skills spanning different clusters. We also show that our data-driven clusters differ from expert-authored categorisations of skills, indicating that important relationships between skills are not captured by expert assessment alone.

Patterns of co-occurrent skills in UK job adverts

TL;DR

The paper builds a data-driven, multiscale portrait of UK skill demand by constructing a co-occurrence network from 65 million Adzuna job adverts and applying MCA-based embedding, Continuous kNN sparsification, and Markov Stability to detect robust skill clusters at multiple resolutions (MS21 and MS7). By analyzing centrality, containment, semantic similarity, salaries, and regional distributions, the study reveals that skill clusters vary in their network roles and wage premia, with broader cross-cluster co-occurrence increasing from 2016 to 2022. Comparison with Lightcast expert categories shows partial agreement, highlighting that data-driven clusters capture relationships not fully captured by expert taxonomy. The work underscores regional industrial differences and a broadening of skill requirements over time, offering a foundation for workforce planning and policy with data-driven cluster labels and interpretable metrics.

Abstract

A job usually involves the application of several complementary or synergistic skills to perform its required tasks. Such relationships are implicitly recognised by employers in the skills they demand when recruiting new employees. Here we construct a skills network based on their co-occurrence in a national level data set of 65 million job postings from the UK spanning 2016 to 2022. We then apply multiscale graph-based community detection to obtain data-driven skill clusters at different levels of resolution that reveal a modular structure across scales. Skill clusters display diverse levels of demand and occupy varying roles within the skills network: some have broad reach across the network (high closeness centrality) while others have higher levels of within-cluster containment, yet with high interconnection across clusters and no skill silos. The skill clusters also display varying levels of semantic similarity, highlighting the difference between co-occurrence in adverts and intrinsic thematic consistency. Clear geographic variation is evident in the demand for each skill cluster across the UK, broadly reflecting the industrial characteristics of each region, e.g., London appears as an outlier as an international hub for finance, education and business. Comparison of data from 2016 and 2022 reveals employers are demanding a broader range of skills over time, with more adverts featuring skills spanning different clusters. We also show that our data-driven clusters differ from expert-authored categorisations of skills, indicating that important relationships between skills are not captured by expert assessment alone.
Paper Structure (14 sections, 9 equations, 16 figures, 2 tables)

This paper contains 14 sections, 9 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Centrality in the skills network. Skills network with nodes colored by (a) closeness centrality rank and (b) betweenness centrality rank. Yellow indicates high centrality rank, while green indicates low centrality rank. (c) The scatter plot between both centralities shows moderate correlation between them. Some highly mentioned skills with high (red dots) and low (green dots) centrality are indicated.
  • Figure 2: Optimal clusterings from Markov Stability on the sparsified graph $\mathcal{G}$. Five optimal highly robust partitions are identified based on minima of the Block NVI. For further details, see text, Refs. arnaudon2023pygenstabilitylambiotte2014randomschaub2014structureschindler2023multiscale, and Appendix \ref{['app:MS']}.
  • Figure 3: Sankey diagram capturing the multiscale clustering of skills at different levels of resolution. The quasi-hierarchical structure of skill co-occurrences is not imposed by the method but emerges naturally from the intrinsic co-occurrence patterns in the data.
  • Figure 4: Co-occurrence skill clusters (MS21) (a) Skills network coloured according to the 21 skill clusters. (b) Summary heatmap of skill clusters properties. Each row is normalised by its maximum. (c) For each of the 21 clusters, word cloud where font size represents skill eigenvector centrality, and list of top 5 most frequent skills.
  • Figure 5: Boxplots for the distributions of (a) closeness centrality (b) containment and (c) within cluster semantic similarity for each cluster. The scatter plots compare for each cluster: (d) median closeness centrality and containment, (e) semantic similarity and containment, and (f) semantic similarity and closeness centrality.
  • ...and 11 more figures