Large Language Model Augmented Exercise Retrieval for Personalized Language Learning

Austin Xu; Will Monroe; Klinton Bicknell

Large Language Model Augmented Exercise Retrieval for Personalized Language Learning

Austin Xu, Will Monroe, Klinton Bicknell

TL;DR

This work tackles zero-shot exercise retrieval for learner-directed language learning, identifying a fundamental referential similarity gap between how learners describe learning objectives and the actual exercise content. It introduces mHyER, which combines multilingual contrastive pretraining with LLM-generated hypothetical retrieval candidates to bridge this gap and perform near-neighbor search on a fixed exercise catalog. The authors create two novel benchmarks, DuoRD and Tatoeba Tags, and demonstrate that mHyER substantially outperforms strong baselines across both datasets and settings, with ablations confirming the complementary benefits of contrastive training and candidate synthesis. The approach enables explicit learner control over content, offering a practical path to more self-directed, personalized language learning at scale.

Abstract

We study the problem of zero-shot exercise retrieval in the context of online language learning, to give learners the ability to explicitly request personalized exercises via natural language. Using real-world data collected from language learners, we observe that vector similarity approaches poorly capture the relationship between exercise content and the language that learners use to express what they want to learn. This semantic gap between queries and content dramatically reduces the effectiveness of general-purpose retrieval models pretrained on large scale information retrieval datasets like MS MARCO. We leverage the generative capabilities of large language models to bridge the gap by synthesizing hypothetical exercises based on the learner's input, which are then used to search for relevant exercises. Our approach, which we call mHyER, overcomes three challenges: (1) lack of relevance labels for training, (2) unrestricted learner input content, and (3) low semantic similarity between input and retrieval candidates. mHyER outperforms several strong baselines on two novel benchmarks created from crowdsourced data and publicly available data.

Large Language Model Augmented Exercise Retrieval for Personalized Language Learning

TL;DR

Abstract

Paper Structure (16 sections, 3 equations, 4 figures, 5 tables)

This paper contains 16 sections, 3 equations, 4 figures, 5 tables.

Introduction
Related work
Problem setup
Learner inputs.
Method
Baseline: direct search with similarity spaces.
mHyER: augmenting direct search with generative capabilities.
Bridging the referential similarity gap with mHyER.
Datasets and experimental results
Datasets
Evaluation procedure and metrics
Baselines
Direct similarity search vs. mHyER: A qualitative case study
Experimental results
Ablation study
...and 1 more sections

Figures (4)

Figure 1: Exercise retrieval for learner directed language learning and our proposed solution, multilingual Hypothetical Exercise Retriever (mHyER). At a high level, learners are allowed to provide any natural language input, and the goal is to retrieve exercises relevant to that input. Our method utilizes large language models to perform zero-shot retrieval.
Figure 2: mHyER consists of two stages. Contrastive finetuning (left) is employed as a training stage to optimize our semantic similarity space for multilingual exercises. Then at retrieval time (right), a large language model is employed to synthesize hypothetical retrieval candidates. These retrieval candidates are then used in direct similarity search to retrieve exercises.
Figure 3: TSNE visualization of exercise, learner input, and GPT-4-synthesized retrieval candidate representations in the representation space of a trained mBERT encoder (left). Learner inputs concentrate in the language about language region (top right), making direct similarity search sub-optimal. Retrieval candidates bridge the referential similarity gap between learner inputs and exercise text and are close in similarity to exercises that meet the learner's specifications (bottom right).
Figure 4: Length of the top 3 retrieved exercise sentences, measured in number of characters, for direct similarity search and mHyER. Exercises retrieved via direct similarity search are inherently biased in length, with a majority of exercises being relatively short. Using mHyER results in exercises of more varied length. This variation in length aligns well with the global distribution of exercises, showing that mHyER effectively translates learner inputs to the in-distribution exercises.

Large Language Model Augmented Exercise Retrieval for Personalized Language Learning

TL;DR

Abstract

Large Language Model Augmented Exercise Retrieval for Personalized Language Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)