Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

Tao Wu; Adam Kapelner

Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

Tao Wu, Adam Kapelner

TL;DR

It is demonstrated that a modern embedding model on neural network architecture, when guided by human supervision, results in a low-cost large supply of near-perfect contexts for teaching vocabulary for a variety of target words.

Abstract

We describe a modern deep learning system that automatically identifies informative contextual examples (\qu{contexts}) for first language vocabulary instruction for high school student. Our paper compares three modeling approaches: (i) an unsupervised similarity-based strategy using MPNet's uniformly contextualized embeddings, (ii) a supervised framework built on instruction-aware, fine-tuned Qwen3 embeddings with a nonlinear regression head and (iii) model (ii) plus handcrafted context features. We introduce a novel metric called the Retention Competency Curve to visualize trade-offs between the discarded proportion of good contexts and the \qu{good-to-bad} contexts ratio providing a compact, unified lens on model performance. Model (iii) delivers the most dramatic gains with performance of a good-to-bad ratio of 440 all while only throwing out 70\% of the good contexts. In summary, we demonstrate that a modern embedding model on neural network architecture, when guided by human supervision, results in a low-cost large supply of near-perfect contexts for teaching vocabulary for a variety of target words.

Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

TL;DR

Abstract

Paper Structure (20 sections, 1 equation, 3 figures, 4 tables)

This paper contains 20 sections, 1 equation, 3 figures, 4 tables.

Introduction
Introduction
Methods
Raw Data
Human Labels
Feature Engineering
The Unsupervised Model
MPNET
Qwen3
The Supervised Learning Model
The Supervised Learning + Handcrafted Features Model
Model Prediction and Model Errors
Performance Validation
Results
Discussion
...and 5 more sections

Figures (3)

Figure 1: Qwen3 multi‑head causal‑attention mechanismzhang2025qwen3 followed by mean‑pooling token vectors whose positions corresponding to a target word are arbitrary in this illustration.
Figure 2: RCC plots for all three models with AUC metrics (see legend).
Figure 3: An example image in which a student can learn the target word supercilious along with an associated caption such as: "Argyle held his crested head with such localized gravity that one might suspect the rest of the garden was simply revolving around him. As the common pheasants scurried for fallen seed, Argyle offered them nothing but the heavy-lidded appraisal of a bored monarch, his supercilious gaze suggesting that their very existence was a mere clerical error in mother nature's ledger."

Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

TL;DR

Abstract

Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)