Semantic Synergy: Unlocking Policy Insights and Learning Pathways Through Advanced Skill Mapping
Phoebe Koundouri, Conrad Landis, Georgios Feretzakis
TL;DR
This work tackles the challenge of extracting and mapping skills from heterogeneous policy and CV texts to standardized occupations and learning pathways. It introduces an end-to-end pipeline that combines advanced NLP, semantic embeddings, and FAISS-based similarity search to produce structured outputs linking skills to ESCO occupations and SDSN/AE4RIA courses, with SDG alignment integrated into the workflow. The system delivers near-human performance in explicit and implicit skill detection (F1 scores above $0.95$ and $0.93$, respectively) and provides an interactive Dash-based dashboard for real-time decision support across policymaking, workforce development, and education. Demonstrations on synthetic and real-world documents show robust, scalable outputs, including skill distributions, occupation rankings, course recommendations, and SDG relevance, underscoring the framework's potential to inform targeted policy interventions, curriculum design, and talent management. Looking forward, the work suggests domain-specific vocabulary expansions, expanded training datasets, real-time data integration, and broader domain adaptation to further increase precision, recall, and practical impact.
Abstract
This research introduces a comprehensive system based on state-of-the-art natural language processing, semantic embedding, and efficient search techniques for retrieving similarities and thus generating actionable insights from raw textual information. The system automatically extracts and aggregates normalized competencies from multiple documents (such as policy files and curricula vitae) and creates strong relationships between recognized competencies, occupation profiles, and related learning courses. To validate its performance, we conducted a multi-tier evaluation that included both explicit and implicit skill references in synthetic and real-world documents. The results showed near-human-level accuracy, with F1 scores exceeding 0.95 for explicit skill detection and above 0.93 for implicit mentions. The system thereby establishes a sound foundation for supporting in-depth collaboration across the AE4RIA network. The methodology involves a multi-stage pipeline based on extensive preprocessing and data cleaning, semantic embedding and segmentation via SentenceTransformer, and skill extraction using a FAISS-based search method. The extracted skills are associated with occupation frameworks (as formulated in the ESCO ontology) and with learning paths offered through the Sustainable Development Goals Academy. Moreover, interactive visualization software, implemented with Dash and Plotly, presents graphs and tables for real-time exploration and informed decision-making by those involved in policymaking, training and learning supply, career transitions, and recruitment. Overall, this system, backed by rigorous validation, offers promising prospects for improved policymaking, human resource development, and lifelong learning by providing structured and actionable insights from raw, complex textual information.
