Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval

Warren Jouanneau; Marc Palyart; Emma Jouffroy

Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval

Warren Jouanneau, Marc Palyart, Emma Jouffroy

TL;DR

This work tackles scalable multilingual skill matching between job proposals and freelancer profiles by introducing a structure-aware two-tower retriever built on a frozen multilingual backbone. It encodes documents at the section level, incorporates section-type awareness, and adds a document-level transformer head to produce aligned embeddings trained with contrastive losses, including InfoNCE and adjacency-based variants. The approach outperforms baselines on retrieval-quality metrics while preserving language alignment, and its production deployment in a vector-store-based pipeline yields lower latency and higher conversion for effective matches. The results demonstrate that preserving document structure and leveraging historical interactions in a multilingual setting can significantly improve scalable candidate retrieval in global marketplaces.

Abstract

Finding the perfect match between a job proposal and a set of freelancers is not an easy task to perform at scale, especially in multiple languages. In this paper, we propose a novel neural retriever architecture that tackles this problem in a multilingual setting. Our method encodes project descriptions and freelancer profiles by leveraging pre-trained multilingual language models. The latter are used as backbone for a custom transformer architecture that aims to keep the structure of the profiles and project. This model is trained with a contrastive loss on historical data. Thanks to several experiments, we show that this approach effectively captures skill matching similarity and facilitates efficient matching, outperforming traditional methods.

Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval

TL;DR

Abstract

Paper Structure (23 sections, 22 equations, 4 figures, 2 tables)

This paper contains 23 sections, 22 equations, 4 figures, 2 tables.

Introduction
Related work
Approach
Architecture
Leveraging pre-trained multilingual model : section-level context
Differentiating between section : positional encoding
Introducing a transformer head : document-level context
From embeddings tokens to vector representation
Training objective
The contrastive loss approach
Generalizing to n-uplet
Adding weak negatives
Adjacency matrix based contrastive losses
Experiment
Models
...and 8 more sections

Figures (4)

Figure 1: Illustration of matching interactions and latent space projection. (a) : Examples of positive and negative interactions between a project proposal and a set of freelancers. (b) : Illustration of documents projection and retrieval within the latent space.
Figure 2: Illustration of the proposed architecture for the freelancer model. Similar sections across profiles are projected into a latent space (shown in the first three squares). Positional encodings and resulting embeddings are then combined to retain section type information before entering a transformer head to process document-level context. The final step involves pooling, producing the document embedding.
Figure 3: Bipartite graph and sub adjacency matrices of six triplets: - two freelancer triplets ($f_{b1}, f_{b1+}, f_{b1-}$), ($f_{b2}, f_{b2+}, f_{b2-}$), - four project-freelancer triplets ($p_1, f_{b1}, f_{b1-}$), ($p_1, f_{b1+}, f_{b1-}$), ($p_2, f_{b2}, f_{b2-}$), ($p_2, f_{b2+}, f_{b2-}$)
Figure 4: KDE Density plots of obtained profile embeddings projected in two dimensions using T-SNE. Color encodes the family or job category associated with the profile embeddings. On the right we zoom in the web, graphic and design family.

Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval

TL;DR

Abstract

Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (4)