Table of Contents
Fetching ...

Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker

Matthias De Lange, Jens-Joris Decorte, Jeroen Van Hautte

TL;DR

This work tackles work-domain NLP with long-tailed, high-cardinality labels by introducing WorkBench, a unified six-task ranking benchmark grounded in ESCO. It proposes Unified Work Embeddings (UWE), a task-agnostic bi-encoder trained via a many-to-many InfoNCE loss over bipartite graphs and a task-agnostic Soft Late Interaction module, enabling zero-shot ranking for unseen target spaces. The approach integrates real-world vacancy data with synthetic enrichment to produce structured, multi-relational training data. Empirical results show UWE outperforms task-specific baselines and generalist embeddings across the WorkBench tasks, with substantial macro MAP and RP@10 gains and lower parameter counts, demonstrating strong cross-task transfer and practical viability for industry-scale applications. The work paves the way for multitask evaluation and unified models in workforce domain NLP, with avenues for multilingual expansion and bias analysis.

Abstract

Workforce transformation across diverse industries has driven an increased demand for specialized natural language processing capabilities. Nevertheless, tasks derived from work-related contexts inherently reflect real-world complexities, characterized by long-tailed distributions, extreme multi-label target spaces, and scarce data availability. The rise of generalist embedding models prompts the question of their performance in the work domain, especially as progress in the field has focused mainly on individual tasks. To this end, we introduce WorkBench, the first unified evaluation suite spanning six work-related tasks formulated explicitly as ranking problems, establishing a common ground for multi-task progress. Based on this benchmark, we find significant positive cross-task transfer, and use this insight to compose task-specific bipartite graphs from real-world data, synthetically enriched through grounding. This leads to Unified Work Embeddings (UWE), a task-agnostic bi-encoder that exploits our training-data structure with a many-to-many InfoNCE objective, and leverages token-level embeddings with task-agnostic soft late interaction. UWE demonstrates zero-shot ranking performance on unseen target spaces in the work domain, enables low-latency inference by caching the task target space embeddings, and shows significant gains in macro-averaged MAP and RP@10 over generalist embedding models.

Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker

TL;DR

This work tackles work-domain NLP with long-tailed, high-cardinality labels by introducing WorkBench, a unified six-task ranking benchmark grounded in ESCO. It proposes Unified Work Embeddings (UWE), a task-agnostic bi-encoder trained via a many-to-many InfoNCE loss over bipartite graphs and a task-agnostic Soft Late Interaction module, enabling zero-shot ranking for unseen target spaces. The approach integrates real-world vacancy data with synthetic enrichment to produce structured, multi-relational training data. Empirical results show UWE outperforms task-specific baselines and generalist embeddings across the WorkBench tasks, with substantial macro MAP and RP@10 gains and lower parameter counts, demonstrating strong cross-task transfer and practical viability for industry-scale applications. The work paves the way for multitask evaluation and unified models in workforce domain NLP, with avenues for multilingual expansion and bias analysis.

Abstract

Workforce transformation across diverse industries has driven an increased demand for specialized natural language processing capabilities. Nevertheless, tasks derived from work-related contexts inherently reflect real-world complexities, characterized by long-tailed distributions, extreme multi-label target spaces, and scarce data availability. The rise of generalist embedding models prompts the question of their performance in the work domain, especially as progress in the field has focused mainly on individual tasks. To this end, we introduce WorkBench, the first unified evaluation suite spanning six work-related tasks formulated explicitly as ranking problems, establishing a common ground for multi-task progress. Based on this benchmark, we find significant positive cross-task transfer, and use this insight to compose task-specific bipartite graphs from real-world data, synthetically enriched through grounding. This leads to Unified Work Embeddings (UWE), a task-agnostic bi-encoder that exploits our training-data structure with a many-to-many InfoNCE objective, and leverages token-level embeddings with task-agnostic soft late interaction. UWE demonstrates zero-shot ranking performance on unseen target spaces in the work domain, enables low-latency inference by caching the task target space embeddings, and shows significant gains in macro-averaged MAP and RP@10 over generalist embedding models.

Paper Structure

This paper contains 36 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of the six WorkBench tasks, demonstrating samples $q \in \mathcal{Q}$ and $y \in \mathcal{Y}$ from the query and target input spaces respectively. Unified Work Embeddings (UWE) independently encodes samples to create embeddings $E_q$ and $E_y$, and maintains a bi-directional ranking structure based on embedding similarity, enabling both tasks $T(\mathcal{Q,Y})$ and $T(\mathcal{Y,Q})$. The query-target order-agnostic setup facilitates support for all WorkBench tasks, including zero-shot performance for unseen target spaces.
  • Figure 2: (a) Data dependency graph of the training data, showing the many-to-many and one-to-many edges from skill space $\mathcal{S}$ to job title $\mathcal{J}$, vacancy sentences $\mathcal{V}$, and skill alternatives $\mathcal{A}$ spaces. (b) Acquisition overview of structured data with job titles and grounded skills. Blue colored boxes indicate structured fields. (c) Log-frequencies for the skills in the raw vacancy dataset, before and after synthetic enrichment, with highlighted background in red indicating the gap of represented skills.
  • Figure 3: Task transfer experiment showing our three partial MTM-loss objectives $\mathcal{L}_{S,J}$, $\mathcal{L}_{S,V}$, $\mathcal{L}_{S,A}$ trained in isolation, starting from the base MPNet model. Knowledge Gain is the delta-increase in MAP, averaged over 5 runs ($\pm$ SE).
  • Figure 4: Overview of temperature influence for late interaction, including SoftMax with token-level embeddings (token), and target-averaged embeddings (mean). MaxSim is depicted as having temperature zero. The final results after the gridsearch are depicted with $95\%$ CI.
  • Figure 5: Job title similarity threshold over the number of included jobs. Synthetic job title skills are merged with real-world job titles when above the similarity threshold.