Table of Contents
Fetching ...

Rate-Distortion Guided Knowledge Graph Construction from Lecture Notes Using Gromov-Wasserstein Optimal Transport

Yuan An, Ruhma Hashmi, Michelle Rogers, Jane Greenberg, Brian K. Smith

TL;DR

The paper tackles the challenge of converting unstructured lecture materials into high-quality, task-oriented knowledge graphs for AI-assisted education. It introduces a rate-distortion framework that balances KG complexity (rate) with fidelity to source content (distortion), guided by a fused Gromov-Wasserstein (FGW) distance between metric-measure spaces representing lectures and KGs. By iteratively refining KGs with local operations (add, merge, split, remove, rewire) and optimizing the Lagrangian $L = R + \beta D$, the approach identifies knee points that yield compact yet informative graphs. A data science lecture case study demonstrates that RD-guided refinement improves content coverage and MCQ quality relative to raw notes, offering a principled, interpretable method for information-theoretic KG optimization in education. The work bridges information theory, optimal transport, and KG engineering to enhance AI-powered learning support and personalized curricula.

Abstract

Task-oriented knowledge graphs (KGs) enable AI-powered learning assistant systems to automatically generate high-quality multiple-choice questions (MCQs). Yet converting unstructured educational materials, such as lecture notes and slides, into KGs that capture key pedagogical content remains difficult. We propose a framework for knowledge graph construction and refinement grounded in rate-distortion (RD) theory and optimal transport geometry. In the framework, lecture content is modeled as a metric-measure space, capturing semantic and relational structure, while candidate KGs are aligned using Fused Gromov-Wasserstein (FGW) couplings to quantify semantic distortion. The rate term, expressed via the size of KG, reflects complexity and compactness. Refinement operators (add, merge, split, remove, rewire) minimize the rate-distortion Lagrangian, yielding compact, information-preserving KGs. Our prototype applied to data science lectures yields interpretable RD curves and shows that MCQs generated from refined KGs consistently surpass those from raw notes on fifteen quality criteria. This study establishes a principled foundation for information-theoretic KG optimization in personalized and AI-assisted education.

Rate-Distortion Guided Knowledge Graph Construction from Lecture Notes Using Gromov-Wasserstein Optimal Transport

TL;DR

The paper tackles the challenge of converting unstructured lecture materials into high-quality, task-oriented knowledge graphs for AI-assisted education. It introduces a rate-distortion framework that balances KG complexity (rate) with fidelity to source content (distortion), guided by a fused Gromov-Wasserstein (FGW) distance between metric-measure spaces representing lectures and KGs. By iteratively refining KGs with local operations (add, merge, split, remove, rewire) and optimizing the Lagrangian , the approach identifies knee points that yield compact yet informative graphs. A data science lecture case study demonstrates that RD-guided refinement improves content coverage and MCQ quality relative to raw notes, offering a principled, interpretable method for information-theoretic KG optimization in education. The work bridges information theory, optimal transport, and KG engineering to enhance AI-powered learning support and personalized curricula.

Abstract

Task-oriented knowledge graphs (KGs) enable AI-powered learning assistant systems to automatically generate high-quality multiple-choice questions (MCQs). Yet converting unstructured educational materials, such as lecture notes and slides, into KGs that capture key pedagogical content remains difficult. We propose a framework for knowledge graph construction and refinement grounded in rate-distortion (RD) theory and optimal transport geometry. In the framework, lecture content is modeled as a metric-measure space, capturing semantic and relational structure, while candidate KGs are aligned using Fused Gromov-Wasserstein (FGW) couplings to quantify semantic distortion. The rate term, expressed via the size of KG, reflects complexity and compactness. Refinement operators (add, merge, split, remove, rewire) minimize the rate-distortion Lagrangian, yielding compact, information-preserving KGs. Our prototype applied to data science lectures yields interpretable RD curves and shows that MCQs generated from refined KGs consistently surpass those from raw notes on fifteen quality criteria. This study establishes a principled foundation for information-theoretic KG optimization in personalized and AI-assisted education.

Paper Structure

This paper contains 23 sections, 12 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: A Rate–Distortion Framework for Knowledge Graph Construction
  • Figure 2: Balance between Rate and Distortion
  • Figure 3: Mapping the lecture notes on left-hand side to the knowledge graph (KG) on right-hand side via FGW coupling $\Pi$ in the middle. The coupling matrix $\Pi$ is computed through minimizing both structural and feature mismatches.
  • Figure 4: Iterative Rate-Distortion guided refinement of a Knowledge Graph representing lecture notes. The process begins with an initial graph and applies local edit operations (add, merge, split, remove, rewire) to minimize combined objective. The resulting graph achieves an optimal tradeoff between rate (R) and distortion (D)
  • Figure 5: (a) Lecture notes in markdown are flattened into elementary segments encoding hierarchical context. Text embeddings are computed, and chronological, logical, and semantic distances are combined into a distance matrix $d_Z(i, j)$; (b) An initial knowledge graph (KG) is extracted from the same notes by an LLM under a predefined set of task-oriented relations. Graph structural and node-semantic distances are integrated into a corresponding matrix $d_V(i, j)$.
  • ...and 3 more figures