Table of Contents
Fetching ...

Labor Space: A Unifying Representation of the Labor Market via Large Language Models

Seongwoon Kim, Yong-Yeol Ahn, Jaehyuk Park

TL;DR

Labor Space provides a unifying, multi-type embedding of labor-market entities by fine-tuning a contextual language model on NAICS, O*NET, ESCO, and Crunchbase descriptions to produce a shared vector space. It enables cross-type proximity, axes-based positioning, and vector arithmetic to model economic shocks and technology exposure, offering a tool for policymakers and business leaders to reason about ripple effects within the labor ecosystem. The approach combines contextual representations with relation-aware training (triplet, cosine, and multiple-negatives ranking losses) to connect industries, occupations, skills, and firms, validated through axis projections and AI-exposure correlations. This framework has the potential to inform skill development, industry strategy, and policy interventions by providing a coherent, scalable view of the labor market dynamics.

Abstract

The labor market is a complex ecosystem comprising diverse, interconnected entities, such as industries, occupations, skills, and firms. Due to the lack of a systematic method to map these heterogeneous entities together, each entity has been analyzed in isolation or only through pairwise relationships, inhibiting comprehensive understanding of the whole ecosystem. Here, we introduce $\textit{Labor Space}$, a vector-space embedding of heterogeneous labor market entities, derived through applying a large language model with fine-tuning. Labor Space exposes the complex relational fabric of various labor market constituents, facilitating coherent integrative analysis of industries, occupations, skills, and firms, while retaining type-specific clustering. We demonstrate its unprecedented analytical capacities, including positioning heterogeneous entities on an economic axes, such as `Manufacturing--Healthcare'. Furthermore, by allowing vector arithmetic of these entities, Labor Space enables the exploration of complex inter-unit relations, and subsequently the estimation of the ramifications of economic shocks on individual units and their ripple effect across the labor market. We posit that Labor Space provides policymakers and business leaders with a comprehensive unifying framework for labor market analysis and simulation, fostering more nuanced and effective strategic decision-making.

Labor Space: A Unifying Representation of the Labor Market via Large Language Models

TL;DR

Labor Space provides a unifying, multi-type embedding of labor-market entities by fine-tuning a contextual language model on NAICS, O*NET, ESCO, and Crunchbase descriptions to produce a shared vector space. It enables cross-type proximity, axes-based positioning, and vector arithmetic to model economic shocks and technology exposure, offering a tool for policymakers and business leaders to reason about ripple effects within the labor ecosystem. The approach combines contextual representations with relation-aware training (triplet, cosine, and multiple-negatives ranking losses) to connect industries, occupations, skills, and firms, validated through axis projections and AI-exposure correlations. This framework has the potential to inform skill development, industry strategy, and policy interventions by providing a coherent, scalable view of the labor market dynamics.

Abstract

The labor market is a complex ecosystem comprising diverse, interconnected entities, such as industries, occupations, skills, and firms. Due to the lack of a systematic method to map these heterogeneous entities together, each entity has been analyzed in isolation or only through pairwise relationships, inhibiting comprehensive understanding of the whole ecosystem. Here, we introduce , a vector-space embedding of heterogeneous labor market entities, derived through applying a large language model with fine-tuning. Labor Space exposes the complex relational fabric of various labor market constituents, facilitating coherent integrative analysis of industries, occupations, skills, and firms, while retaining type-specific clustering. We demonstrate its unprecedented analytical capacities, including positioning heterogeneous entities on an economic axes, such as `Manufacturing--Healthcare'. Furthermore, by allowing vector arithmetic of these entities, Labor Space enables the exploration of complex inter-unit relations, and subsequently the estimation of the ramifications of economic shocks on individual units and their ripple effect across the labor market. We posit that Labor Space provides policymakers and business leaders with a comprehensive unifying framework for labor market analysis and simulation, fostering more nuanced and effective strategic decision-making.
Paper Structure (25 sections, 5 figures, 7 tables)

This paper contains 25 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Constructing the Labor Space. (A) Sample entity description from the 2,120 available. (B) Google's BERT, fine-tuned with descriptions from NAICS, O*NET, ESCO, and Crunchbase, predicts the [Mask] token using its context, learning labor market nuances. (C) training inter-relations between Labor Space entities using paired datasets, as magnified in the right-side figure. (D) Both contextual and relational information is captured in BERT's final hidden layer, from which we extract word vectors. (E) A full description vector is represented by averaging its word vectors. (F) Each vector is then labeled with its corresponding title.
  • Figure 2: Visualizing Labor Space (A) Labor entities, originally 768-dimensional, are mapped to a 2D space using UMAP. (A1) Highlighted values in the Tradable--Nontradable dimension show close ties with real estate. (A2, A3) Construction-related entities cluster due to the industry's blend of manufacturing and tradability. (A4) Emphasized values on the Manufacturing--Healthcare and Social Assistance dimension show deep ties to healthcare. (B) Map colored by cosine similarity between V(Tradable → Nontradable) and labor vectors; black rectangles indicate locations from A1, A2, A3. (C) Distribution of cosine similarity between V(Manufacturing → Healthcare and Social Assistance) and labor vectors; the black rectangle pinpoints the location in A4.
  • Figure 3: Spectrum Plot of Labor Market Units. All labor entities are projected onto the V(Manufacturing → Healthcare and Social Assistance) axis. (A) Vertical lines within the industry spectrum box show industry embedding projections. Representative industry titles are annotated, using NAICS 2-digit classification for sub-spectrum plotting. (B-D) The same projection method applies for firms (using General Industry Classification System), occupations (using Standard Occupation Classification), and skills (using ESCO skill level two hierarchy).
  • Figure 4: Vector analogy of firm and industry entities.
  • Figure 5: Correlation between the industry-wise (A and C) and occupation-wise (B and D) exposures to AI (A and B) and Language Models (C and D) by felten2021occupational (X-axis) and the cosine similarity of AI and Language Model description vectors and industrial and occupational embedding vectors (Y-axis).