Table of Contents
Fetching ...

NLP Occupational Emergence Analysis: How Occupations Form and Evolve in Real Time -- A Zero-Assumption Method Demonstrated on AI in the US Technology Workforce, 2022-2026

David Nordfors

Abstract

Occupations form and evolve faster than classification systems can track. We propose that a genuine occupation is a self-reinforcing structure (a bipartite co-attractor) in which a shared professional vocabulary makes practitioners cohesive as a group, and the cohesive group sustains the vocabulary. This co-attractor concept enables a zero-assumption method for detecting occupational emergence from resume data, requiring no predefined taxonomy or job titles: we test vocabulary cohesion and population cohesion independently, with ablation to test whether the vocabulary is the mechanism binding the population. Applied to 8.2 million US resumes (2022-2026), the method correctly identifies established occupations and reveals a striking asymmetry for AI: a cohesive professional vocabulary formed rapidly in early 2024, but the practitioner population never cohered. The pre-existing AI community dissolved as the tools went mainstream, and the new vocabulary was absorbed into existing careers rather than binding a new occupation. AI appears to be a diffusing technology, not an emerging occupation. We discuss whether introducing an "AI Engineer" occupational category could catalyze population cohesion around the already-formed vocabulary, completing the co-attractor.

NLP Occupational Emergence Analysis: How Occupations Form and Evolve in Real Time -- A Zero-Assumption Method Demonstrated on AI in the US Technology Workforce, 2022-2026

Abstract

Occupations form and evolve faster than classification systems can track. We propose that a genuine occupation is a self-reinforcing structure (a bipartite co-attractor) in which a shared professional vocabulary makes practitioners cohesive as a group, and the cohesive group sustains the vocabulary. This co-attractor concept enables a zero-assumption method for detecting occupational emergence from resume data, requiring no predefined taxonomy or job titles: we test vocabulary cohesion and population cohesion independently, with ablation to test whether the vocabulary is the mechanism binding the population. Applied to 8.2 million US resumes (2022-2026), the method correctly identifies established occupations and reveals a striking asymmetry for AI: a cohesive professional vocabulary formed rapidly in early 2024, but the practitioner population never cohered. The pre-existing AI community dissolved as the tools went mainstream, and the new vocabulary was absorbed into existing careers rather than binding a new occupation. AI appears to be a diffusing technology, not an emerging occupation. We discuss whether introducing an "AI Engineer" occupational category could catalyze population cohesion around the already-formed vocabulary, completing the co-attractor.
Paper Structure (74 sections, 2 equations, 5 figures, 18 tables)

This paper contains 74 sections, 2 equations, 5 figures, 18 tables.

Figures (5)

  • Figure 1: Vocabulary cohesion across half-year windows. (A) Within-group co-occurrence density. (B) Density ratio (group / background). (C) Cross-group co-occurrence density heatmaps.
  • Figure 2: Anchor analysis clustermap --- 2025-H2, k=14, with "ai" removed. 20 of 21 seeds concentrate in a single cluster. Term categories indicated by the leftmost column and label color (red = ai_core, orange = ai_related, blue = IT; bold = seed). Heatmap intensity: cluster membership percentage.
  • Figure 3: Population cohesion ablation. (A) Broad AI population: full vocabulary (dark blue), minus technical core (purple dashed), minus all ai_core (medium blue), minus generic core (light blue). (B) Specialist kernel (ai_technical_core users): same ablation structure.
  • Figure 4: Specialist kernel dissolution. Left axis (green): population cohesion resilience under ablation. Right axis (grey bars): population size N.
  • Figure 5: Dual emergence trajectory. Red lines: vocabulary cohesion (ai_core ratio, tech core ratio). Blue line: broad population resilience (minus ai_core). Green line: specialist resilience (minus ai_core). Grey dashed: background (1.0$\times$).