Towards Understanding Ambiguity Resolution in Multimodal Inference of Meaning
Yufei Wang, Adriana Kovashka, Loretta Fernández, Marc N. Coutanche, Seth Wiener
TL;DR
This study tackles ambiguity resolution in multimodal foreign-language learning by examining how image and text features, along with learners' language backgrounds, influence the inference of unfamiliar word meanings. It reports two participant studies and analyzes data- and participant-driven predictors, finding limited robust features overall but language-specific effects. It then evaluates AI systems' ability to predict participant performance, showing that incorporating strategy information and image-derived descriptions enhances predictive power and demonstrates the potential for AI-assisted dynamic curation of increasingly challenging learning examples. The work points toward adaptive, multimodal learning environments and emphasizes the need for larger datasets and safeguards when applying AI to pedagogy.
Abstract
We investigate a new setting for foreign language learning, where learners infer the meaning of unfamiliar words in a multimodal context of a sentence describing a paired image. We conduct studies with human participants using different image-text pairs. We analyze the features of the data (i.e., images and texts) that make it easier for participants to infer the meaning of a masked or unfamiliar word, and what language backgrounds of the participants correlate with success. We find only some intuitive features have strong correlations with participant performance, prompting the need for further investigating of predictive features for success in these tasks. We also analyze the ability of AI systems to reason about participant performance, and discover promising future directions for improving this reasoning ability.
