Table of Contents
Fetching ...

Automated Skill Decomposition Meets Expert Ontologies: Bridging the Granularity Gap with LLMs

Le Ngoc Luyen, Marie-Hélène Abel

TL;DR

The paper tackles the granularity gap between broad skill labels in authoritative ontologies and the finer sub-skills needed for adaptive learning and workforce mapping. It proposes an ontology-grounded evaluation framework and introduces the ROME-ESCO-DecompSkill benchmark to assess LLM-based skill decomposition under zero-shot and leakage-safe few-shot prompting. Through semantic F1 and hierarchy-aware F1 metrics, the study shows zero-shot provides solid baselines while few-shot prompts improve phrasing stability and taxonomic placement, with exemplar choice influencing latency and coverage for different model sizes. The findings offer a reproducible foundation for ontology-faithful skill decomposition with practical implications for curriculum design, personalized learning, and employment services, and point to future work in retrieval-augmented grounding and multilingual deployment.

Abstract

This paper investigates automated skill decomposition using Large Language Models (LLMs) and proposes a rigorous, ontology-grounded evaluation framework. Our framework standardizes the pipeline from prompting and generation to normalization and alignment with ontology nodes. To evaluate outputs, we introduce two metrics: a semantic F1-score that uses optimal embedding-based matching to assess content accuracy, and a hierarchy-aware F1-score that credits structurally correct placements to assess granularity. We conduct experiments on ROME-ESCO-DecompSkill, a curated subset of parents, comparing two prompting strategies: zero-shot and leakage-safe few-shot with exemplars. Across diverse LLMs, zero-shot offers a strong baseline, while few-shot consistently stabilizes phrasing and granularity and improves hierarchy-aware alignment. A latency analysis further shows that exemplar-guided prompts are competitive - and sometimes faster - than unguided zero-shot due to more schema-compliant completions. Together, the framework, benchmark, and metrics provide a reproducible foundation for developing ontology-faithful skill decomposition systems.

Automated Skill Decomposition Meets Expert Ontologies: Bridging the Granularity Gap with LLMs

TL;DR

The paper tackles the granularity gap between broad skill labels in authoritative ontologies and the finer sub-skills needed for adaptive learning and workforce mapping. It proposes an ontology-grounded evaluation framework and introduces the ROME-ESCO-DecompSkill benchmark to assess LLM-based skill decomposition under zero-shot and leakage-safe few-shot prompting. Through semantic F1 and hierarchy-aware F1 metrics, the study shows zero-shot provides solid baselines while few-shot prompts improve phrasing stability and taxonomic placement, with exemplar choice influencing latency and coverage for different model sizes. The findings offer a reproducible foundation for ontology-faithful skill decomposition with practical implications for curriculum design, personalized learning, and employment services, and point to future work in retrieval-augmented grounding and multilingual deployment.

Abstract

This paper investigates automated skill decomposition using Large Language Models (LLMs) and proposes a rigorous, ontology-grounded evaluation framework. Our framework standardizes the pipeline from prompting and generation to normalization and alignment with ontology nodes. To evaluate outputs, we introduce two metrics: a semantic F1-score that uses optimal embedding-based matching to assess content accuracy, and a hierarchy-aware F1-score that credits structurally correct placements to assess granularity. We conduct experiments on ROME-ESCO-DecompSkill, a curated subset of parents, comparing two prompting strategies: zero-shot and leakage-safe few-shot with exemplars. Across diverse LLMs, zero-shot offers a strong baseline, while few-shot consistently stabilizes phrasing and granularity and improves hierarchy-aware alignment. A latency analysis further shows that exemplar-guided prompts are competitive - and sometimes faster - than unguided zero-shot due to more schema-compliant completions. Together, the framework, benchmark, and metrics provide a reproducible foundation for developing ontology-faithful skill decomposition systems.

Paper Structure

This paper contains 21 sections, 12 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Example of ontology slice and LLM-based predictions with alignment results.
  • Figure 2: End-to-end framework for ontology-grounded skill decomposition: from $(S_{\text{broad}},C)$ the generator (optionally with exemplars) decodes exactly $k$ candidates, normalizes/deduplicates, aligns them to ontology nodes, and computes depth-aware scores. The ontology is used solely as external ground truth for alignment and evaluation.
  • Figure 3: Prompt templates by strategies (Context, Instruction, Input Text, Formatting Indicator).
  • Figure 4: Per-parent average wall time (seconds, log scale) for Zero-shot and Few-shot prompting across LLMs.