Table of Contents
Fetching ...

Understanding and Modeling Job Marketplace with Pretrained Language Models

Yaochen Zhu, Liang Wu, Binchi Zhang, Song Wang, Qi Guo, Liangjie Hong, Luke Simon, Jundong Li

TL;DR

This work tackles the challenge of modeling a text-rich, heterogeneous job marketplace by treating it as a text-attributed heterogeneous graph and introducing PLM4Job, a graph-oriented pretrained language model. PLM4Job tightly couples a pretrained language model with the marketplace topology through heterogeneous ego-graph prompting, which tokenizes and embeds center nodes, entity types, and graph distances, and uses metapath-based structural prompts to aggregate information along multiple relational patterns. A proximity-aware attention mechanism aligns the PLM's attention with the marketplace's heterogeneous proximity relations, while task-specific finetuning enables robust node- and link-level predictions; node predictions employ class-token embeddings to avoid hallucinations. Experiments on a LinkedIn dataset show PLM4Job outperforms diverse baselines, and deploying PLM4Job embeddings in two-tower systems further boosts online retrieval metrics, demonstrating the practical impact of a foundation model for job marketplaces.

Abstract

Job marketplace is a heterogeneous graph composed of interactions among members (job-seekers), companies, and jobs. Understanding and modeling job marketplace can benefit both job seekers and employers, ultimately contributing to the greater good of the society. However, existing graph neural network (GNN)-based methods have shallow understandings of the associated textual features and heterogeneous relations. To address the above challenges, we propose PLM4Job, a job marketplace foundation model that tightly couples pretrained language models (PLM) with job market graph, aiming to fully utilize the pretrained knowledge and reasoning ability to model member/job textual features as well as various member-job relations simultaneously. In the pretraining phase, we propose a heterogeneous ego-graph-based prompting strategy to model and aggregate member/job textual features based on the topological structure around the target member/job node, where entity type embeddings and graph positional embeddings are introduced accordingly to model different entities and their heterogeneous relations. Meanwhile, a proximity-aware attention alignment strategy is designed to dynamically adjust the attention of the PLM on ego-graph node tokens in the prompt, such that the attention can be better aligned with job marketplace semantics. Extensive experiments at LinkedIn demonstrate the effectiveness of PLM4Job.

Understanding and Modeling Job Marketplace with Pretrained Language Models

TL;DR

This work tackles the challenge of modeling a text-rich, heterogeneous job marketplace by treating it as a text-attributed heterogeneous graph and introducing PLM4Job, a graph-oriented pretrained language model. PLM4Job tightly couples a pretrained language model with the marketplace topology through heterogeneous ego-graph prompting, which tokenizes and embeds center nodes, entity types, and graph distances, and uses metapath-based structural prompts to aggregate information along multiple relational patterns. A proximity-aware attention mechanism aligns the PLM's attention with the marketplace's heterogeneous proximity relations, while task-specific finetuning enables robust node- and link-level predictions; node predictions employ class-token embeddings to avoid hallucinations. Experiments on a LinkedIn dataset show PLM4Job outperforms diverse baselines, and deploying PLM4Job embeddings in two-tower systems further boosts online retrieval metrics, demonstrating the practical impact of a foundation model for job marketplaces.

Abstract

Job marketplace is a heterogeneous graph composed of interactions among members (job-seekers), companies, and jobs. Understanding and modeling job marketplace can benefit both job seekers and employers, ultimately contributing to the greater good of the society. However, existing graph neural network (GNN)-based methods have shallow understandings of the associated textual features and heterogeneous relations. To address the above challenges, we propose PLM4Job, a job marketplace foundation model that tightly couples pretrained language models (PLM) with job market graph, aiming to fully utilize the pretrained knowledge and reasoning ability to model member/job textual features as well as various member-job relations simultaneously. In the pretraining phase, we propose a heterogeneous ego-graph-based prompting strategy to model and aggregate member/job textual features based on the topological structure around the target member/job node, where entity type embeddings and graph positional embeddings are introduced accordingly to model different entities and their heterogeneous relations. Meanwhile, a proximity-aware attention alignment strategy is designed to dynamically adjust the attention of the PLM on ego-graph node tokens in the prompt, such that the attention can be better aligned with job marketplace semantics. Extensive experiments at LinkedIn demonstrate the effectiveness of PLM4Job.
Paper Structure (24 sections, 7 equations, 1 figure, 6 tables)

This paper contains 24 sections, 7 equations, 1 figure, 6 tables.

Figures (1)

  • Figure 1: The job marketplace heterogeneous ego-graph and the corresponding ego-graph-based prompt.

Theorems & Definitions (1)

  • definition 1