Understanding and Modeling Job Marketplace with Pretrained Language Models
Yaochen Zhu, Liang Wu, Binchi Zhang, Song Wang, Qi Guo, Liangjie Hong, Luke Simon, Jundong Li
TL;DR
This work tackles the challenge of modeling a text-rich, heterogeneous job marketplace by treating it as a text-attributed heterogeneous graph and introducing PLM4Job, a graph-oriented pretrained language model. PLM4Job tightly couples a pretrained language model with the marketplace topology through heterogeneous ego-graph prompting, which tokenizes and embeds center nodes, entity types, and graph distances, and uses metapath-based structural prompts to aggregate information along multiple relational patterns. A proximity-aware attention mechanism aligns the PLM's attention with the marketplace's heterogeneous proximity relations, while task-specific finetuning enables robust node- and link-level predictions; node predictions employ class-token embeddings to avoid hallucinations. Experiments on a LinkedIn dataset show PLM4Job outperforms diverse baselines, and deploying PLM4Job embeddings in two-tower systems further boosts online retrieval metrics, demonstrating the practical impact of a foundation model for job marketplaces.
Abstract
Job marketplace is a heterogeneous graph composed of interactions among members (job-seekers), companies, and jobs. Understanding and modeling job marketplace can benefit both job seekers and employers, ultimately contributing to the greater good of the society. However, existing graph neural network (GNN)-based methods have shallow understandings of the associated textual features and heterogeneous relations. To address the above challenges, we propose PLM4Job, a job marketplace foundation model that tightly couples pretrained language models (PLM) with job market graph, aiming to fully utilize the pretrained knowledge and reasoning ability to model member/job textual features as well as various member-job relations simultaneously. In the pretraining phase, we propose a heterogeneous ego-graph-based prompting strategy to model and aggregate member/job textual features based on the topological structure around the target member/job node, where entity type embeddings and graph positional embeddings are introduced accordingly to model different entities and their heterogeneous relations. Meanwhile, a proximity-aware attention alignment strategy is designed to dynamically adjust the attention of the PLM on ego-graph node tokens in the prompt, such that the attention can be better aligned with job marketplace semantics. Extensive experiments at LinkedIn demonstrate the effectiveness of PLM4Job.
