SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model

Nan Li; Bo Kang; Tijl De Bie

SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model

Nan Li, Bo Kang, Tijl De Bie

TL;DR

SkillGPT tackles the challenge of extracting and standardizing skills from free-text job descriptions and resumes without heavy supervision or preprocessing. It combines an open-source LLM (Vicuna-13B) with a two-stage workflow: summarization of input text and vector similarity search against precomputed ESCO embeddings to retrieve matching codes. The system supports multiple ESCO concept types and three languages, delivered via a RESTful API or a GUI, enabling efficient SES for academic exploration. The approach addresses cost and latency concerns of direct prompting while maintaining accuracy through embedding-based retrieval. The work provides an open-source prototype for SES that can facilitate job-matching, career planning, and downstream HR analytics.

Abstract

We present SkillGPT, a tool for skill extraction and standardization (SES) from free-style job descriptions and user profiles with an open-source Large Language Model (LLM) as backbone. Most previous methods for similar tasks either need supervision or rely on heavy data-preprocessing and feature engineering. Directly prompting the latest conversational LLM for standard skills, however, is slow, costly and inaccurate. In contrast, SkillGPT utilizes a LLM to perform its tasks in steps via summarization and vector similarity search, to balance speed with precision. The backbone LLM of SkillGPT is based on Llama, free for academic use and thus useful for exploratory research and prototype development. Hence, our cost-free SkillGPT gives users the convenience of conversational SES, efficiently and reliably.

SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model

TL;DR

Abstract

SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model

Authors

TL;DR

Abstract

Table of Contents

Figures (2)