Table of Contents
Fetching ...

Tec-Habilidad: Skill Classification for Bridging Education and Employment

Sabur Butt, Hector G. Ceballos, Diana P. Madera

TL;DR

This work tackles the lack of Spanish-language resources for skill extraction and KSAO-based classification, addressing education–employment alignment. It builds a Spanish skill dataset from Indeed Mexico automotive postings, annotated for Knowledge, Skills, Abilities, and Other (KSAO) categories, with two annotation tasks and inter-annotator agreement metrics. Deep-learning baselines (BETO and mBERT) show strong performance when definitions are provided, while zero-shot LLMs underperform, highlighting the dataset's usefulness for domain-specific skill analysis. The dataset comprises 8,484 unique skills across KSAO categories and is publicly accessible for research, enabling further development of Spanish NLP tools for job-market analytics and curriculum alignment.

Abstract

Job application and assessment processes have evolved significantly in recent years, largely due to advancements in technology and changes in the way companies operate. Skill extraction and classification remain an important component of the modern hiring process as it provides a more objective way to evaluate candidates and automatically align their skills with the job requirements. However, to effectively evaluate the skills, the skill extraction tools must recognize varied mentions of skills on resumes, including direct mentions, implications, synonyms, acronyms, phrases, and proficiency levels, and differentiate between hard and soft skills. While tools like LLMs (Large Model Models) help extract and categorize skills from job applications, there's a lack of comprehensive datasets for evaluating the effectiveness of these models in accurately identifying and classifying skills in Spanish-language job applications. This gap hinders our ability to assess the reliability and precision of the models, which is crucial for ensuring that the selected candidates truly possess the required skills for the job. In this paper, we develop a Spanish language dataset for skill extraction and classification, provide annotation methodology to distinguish between knowledge, skill, and abilities, and provide deep learning baselines to advance robust solutions for skill classification.

Tec-Habilidad: Skill Classification for Bridging Education and Employment

TL;DR

This work tackles the lack of Spanish-language resources for skill extraction and KSAO-based classification, addressing education–employment alignment. It builds a Spanish skill dataset from Indeed Mexico automotive postings, annotated for Knowledge, Skills, Abilities, and Other (KSAO) categories, with two annotation tasks and inter-annotator agreement metrics. Deep-learning baselines (BETO and mBERT) show strong performance when definitions are provided, while zero-shot LLMs underperform, highlighting the dataset's usefulness for domain-specific skill analysis. The dataset comprises 8,484 unique skills across KSAO categories and is publicly accessible for research, enabling further development of Spanish NLP tools for job-market analytics and curriculum alignment.

Abstract

Job application and assessment processes have evolved significantly in recent years, largely due to advancements in technology and changes in the way companies operate. Skill extraction and classification remain an important component of the modern hiring process as it provides a more objective way to evaluate candidates and automatically align their skills with the job requirements. However, to effectively evaluate the skills, the skill extraction tools must recognize varied mentions of skills on resumes, including direct mentions, implications, synonyms, acronyms, phrases, and proficiency levels, and differentiate between hard and soft skills. While tools like LLMs (Large Model Models) help extract and categorize skills from job applications, there's a lack of comprehensive datasets for evaluating the effectiveness of these models in accurately identifying and classifying skills in Spanish-language job applications. This gap hinders our ability to assess the reliability and precision of the models, which is crucial for ensuring that the selected candidates truly possess the required skills for the job. In this paper, we develop a Spanish language dataset for skill extraction and classification, provide annotation methodology to distinguish between knowledge, skill, and abilities, and provide deep learning baselines to advance robust solutions for skill classification.

Paper Structure

This paper contains 22 sections, 1 figure, 7 tables.

Figures (1)

  • Figure 1: High-level overview of the methodology