Smart-Hiring: An Explainable end-to-end Pipeline for CV Information Extraction and Job Matching
Kenza Khelkhal, Dihia Lanasri
TL;DR
Smart-Hiring addresses the inefficiency and bias inherent in manual resume screening by providing an end-to-end NLP pipeline that extracts structured candidate data and semantically matches it to job descriptions. The approach combines layout-aware resume parsing, hybrid extraction (rule-based, ML, heuristics), and transformer-based semantic matching with an explainability layer that reveals which factors drive rankings. Experiments on a real-world, multilingual dataset (French and English) demonstrate competitive matching accuracy while maintaining interpretability and auditable decision rationales. The work highlights practical implications for recruitment analytics, including bias mitigation considerations and the potential for large-scale deployment, and suggests future work with multilingual support and LLM-enhanced reasoning.
Abstract
Hiring processes often involve the manual screening of hundreds of resumes for each job, a task that is time and effort consuming, error-prone, and subject to human bias. This paper presents Smart-Hiring, an end-to-end Natural Language Processing (NLP) pipeline de- signed to automatically extract structured information from unstructured resumes and to semantically match candidates with job descriptions. The proposed system combines document parsing, named-entity recognition, and contextual text embedding techniques to capture skills, experience, and qualifications. Using advanced NLP technics, Smart-Hiring encodes both resumes and job descriptions in a shared vector space to compute similarity scores between candidates and job postings. The pipeline is modular and explainable, allowing users to inspect extracted entities and matching rationales. Experiments were conducted on a real-world dataset of resumes and job descriptions spanning multiple professional domains, demonstrating the robustness and feasibility of the proposed approach. The system achieves competitive matching accuracy while preserving a high degree of interpretability and transparency in its decision process. This work introduces a scalable and practical NLP frame- work for recruitment analytics and outlines promising directions for bias mitigation, fairness-aware modeling, and large-scale deployment of data-driven hiring solutions.
