Table of Contents
Fetching ...

Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

Tao Wu, Adam Kapelner

TL;DR

It is demonstrated that a modern embedding model on neural network architecture, when guided by human supervision, results in a low-cost large supply of near-perfect contexts for teaching vocabulary for a variety of target words.

Abstract

We describe a modern deep learning system that automatically identifies informative contextual examples (\qu{contexts}) for first language vocabulary instruction for high school student. Our paper compares three modeling approaches: (i) an unsupervised similarity-based strategy using MPNet's uniformly contextualized embeddings, (ii) a supervised framework built on instruction-aware, fine-tuned Qwen3 embeddings with a nonlinear regression head and (iii) model (ii) plus handcrafted context features. We introduce a novel metric called the Retention Competency Curve to visualize trade-offs between the discarded proportion of good contexts and the \qu{good-to-bad} contexts ratio providing a compact, unified lens on model performance. Model (iii) delivers the most dramatic gains with performance of a good-to-bad ratio of 440 all while only throwing out 70\% of the good contexts. In summary, we demonstrate that a modern embedding model on neural network architecture, when guided by human supervision, results in a low-cost large supply of near-perfect contexts for teaching vocabulary for a variety of target words.

Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

TL;DR

It is demonstrated that a modern embedding model on neural network architecture, when guided by human supervision, results in a low-cost large supply of near-perfect contexts for teaching vocabulary for a variety of target words.

Abstract

We describe a modern deep learning system that automatically identifies informative contextual examples (\qu{contexts}) for first language vocabulary instruction for high school student. Our paper compares three modeling approaches: (i) an unsupervised similarity-based strategy using MPNet's uniformly contextualized embeddings, (ii) a supervised framework built on instruction-aware, fine-tuned Qwen3 embeddings with a nonlinear regression head and (iii) model (ii) plus handcrafted context features. We introduce a novel metric called the Retention Competency Curve to visualize trade-offs between the discarded proportion of good contexts and the \qu{good-to-bad} contexts ratio providing a compact, unified lens on model performance. Model (iii) delivers the most dramatic gains with performance of a good-to-bad ratio of 440 all while only throwing out 70\% of the good contexts. In summary, we demonstrate that a modern embedding model on neural network architecture, when guided by human supervision, results in a low-cost large supply of near-perfect contexts for teaching vocabulary for a variety of target words.
Paper Structure (20 sections, 1 equation, 3 figures, 4 tables)

This paper contains 20 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Qwen3 multi‑head causal‑attention mechanismzhang2025qwen3 followed by mean‑pooling token vectors whose positions corresponding to a target word are arbitrary in this illustration.
  • Figure 2: RCC plots for all three models with AUC metrics (see legend).
  • Figure 3: An example image in which a student can learn the target word supercilious along with an associated caption such as: "Argyle held his crested head with such localized gravity that one might suspect the rest of the garden was simply revolving around him. As the common pheasants scurried for fallen seed, Argyle offered them nothing but the heavy-lidded appraisal of a bored monarch, his supercilious gaze suggesting that their very existence was a mere clerical error in mother nature's ledger."