Table of Contents
Fetching ...

Towards Understanding Ambiguity Resolution in Multimodal Inference of Meaning

Yufei Wang, Adriana Kovashka, Loretta Fernández, Marc N. Coutanche, Seth Wiener

TL;DR

This study tackles ambiguity resolution in multimodal foreign-language learning by examining how image and text features, along with learners' language backgrounds, influence the inference of unfamiliar word meanings. It reports two participant studies and analyzes data- and participant-driven predictors, finding limited robust features overall but language-specific effects. It then evaluates AI systems' ability to predict participant performance, showing that incorporating strategy information and image-derived descriptions enhances predictive power and demonstrates the potential for AI-assisted dynamic curation of increasingly challenging learning examples. The work points toward adaptive, multimodal learning environments and emphasizes the need for larger datasets and safeguards when applying AI to pedagogy.

Abstract

We investigate a new setting for foreign language learning, where learners infer the meaning of unfamiliar words in a multimodal context of a sentence describing a paired image. We conduct studies with human participants using different image-text pairs. We analyze the features of the data (i.e., images and texts) that make it easier for participants to infer the meaning of a masked or unfamiliar word, and what language backgrounds of the participants correlate with success. We find only some intuitive features have strong correlations with participant performance, prompting the need for further investigating of predictive features for success in these tasks. We also analyze the ability of AI systems to reason about participant performance, and discover promising future directions for improving this reasoning ability.

Towards Understanding Ambiguity Resolution in Multimodal Inference of Meaning

TL;DR

This study tackles ambiguity resolution in multimodal foreign-language learning by examining how image and text features, along with learners' language backgrounds, influence the inference of unfamiliar word meanings. It reports two participant studies and analyzes data- and participant-driven predictors, finding limited robust features overall but language-specific effects. It then evaluates AI systems' ability to predict participant performance, showing that incorporating strategy information and image-derived descriptions enhances predictive power and demonstrates the potential for AI-assisted dynamic curation of increasingly challenging learning examples. The work points toward adaptive, multimodal learning environments and emphasizes the need for larger datasets and safeguards when applying AI to pedagogy.

Abstract

We investigate a new setting for foreign language learning, where learners infer the meaning of unfamiliar words in a multimodal context of a sentence describing a paired image. We conduct studies with human participants using different image-text pairs. We analyze the features of the data (i.e., images and texts) that make it easier for participants to infer the meaning of a masked or unfamiliar word, and what language backgrounds of the participants correlate with success. We find only some intuitive features have strong correlations with participant performance, prompting the need for further investigating of predictive features for success in these tasks. We also analyze the ability of AI systems to reason about participant performance, and discover promising future directions for improving this reasoning ability.

Paper Structure

This paper contains 11 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: An example meaning inference task.
  • Figure 2: Five image-text pairs from our preliminary word meaning inference experiment.