Table of Contents
Fetching ...

Implicit assessment of language learning during practice as accurate as explicit testing

Jue Hou, Anisia Katinskaia, Anh-Duc Vu, Roman Yangarber

TL;DR

This work uses learner data collected from exhaustive tests under imperfect conditions, to train an IRT model to guide adaptive tests, and investigates whether it can accurately estimate learner ability directly from the context of practice with exercises, without testing.

Abstract

Assessment of proficiency of the learner is an essential part of Intelligent Tutoring Systems (ITS). We use Item Response Theory (IRT) in computer-aided language learning for assessment of student ability in two contexts: in test sessions, and in exercises during practice sessions. Exhaustive testing across a wide range of skills can provide a detailed picture of proficiency, but may be undesirable for a number of reasons. Therefore, we first aim to replace exhaustive tests with efficient but accurate adaptive tests. We use learner data collected from exhaustive tests under imperfect conditions, to train an IRT model to guide adaptive tests. Simulations and experiments with real learner data confirm that this approach is efficient and accurate. Second, we explore whether we can accurately estimate learner ability directly from the context of practice with exercises, without testing. We transform learner data collected from exercise sessions into a form that can be used for IRT modeling. This is done by linking the exercises to {\em linguistic constructs}; the constructs are then treated as "items" within IRT. We present results from large-scale studies with thousands of learners. Using teacher assessments of student ability as "ground truth," we compare the estimates obtained from tests vs. those from exercises. The experiments confirm that the IRT models can produce accurate ability estimation based on exercises.

Implicit assessment of language learning during practice as accurate as explicit testing

TL;DR

This work uses learner data collected from exhaustive tests under imperfect conditions, to train an IRT model to guide adaptive tests, and investigates whether it can accurately estimate learner ability directly from the context of practice with exercises, without testing.

Abstract

Assessment of proficiency of the learner is an essential part of Intelligent Tutoring Systems (ITS). We use Item Response Theory (IRT) in computer-aided language learning for assessment of student ability in two contexts: in test sessions, and in exercises during practice sessions. Exhaustive testing across a wide range of skills can provide a detailed picture of proficiency, but may be undesirable for a number of reasons. Therefore, we first aim to replace exhaustive tests with efficient but accurate adaptive tests. We use learner data collected from exhaustive tests under imperfect conditions, to train an IRT model to guide adaptive tests. Simulations and experiments with real learner data confirm that this approach is efficient and accurate. Second, we explore whether we can accurately estimate learner ability directly from the context of practice with exercises, without testing. We transform learner data collected from exercise sessions into a form that can be used for IRT modeling. This is done by linking the exercises to {\em linguistic constructs}; the constructs are then treated as "items" within IRT. We present results from large-scale studies with thousands of learners. Using teacher assessments of student ability as "ground truth," we compare the estimates obtained from tests vs. those from exercises. The experiments confirm that the IRT models can produce accurate ability estimation based on exercises.
Paper Structure (32 sections, 3 equations, 15 figures)

This paper contains 32 sections, 3 equations, 15 figures.

Figures (15)

  • Figure 1: Zone of proximal development: the blue area---tasks that the learner can perform with some assistance are those that the learner is most prepared to learn next.
  • Figure 2: Simulation on real learner data with manually labelled CEFR levels of learners. X-axis---the 6 CEFR levels; Y-axis---ability estimate.
  • Figure 3: Distribution of length of adaptive test in real learner simulations in Figure \ref{['fig:simu_real']}. X-axis---length of test; Y-axis---number of tests in each bin.
  • Figure 4: Adaptive simulation on real learner data, using manually judged CEFR level of each question as its difficulty estimate ($b_{i}$). X-axis---6 CEFR levels; Y-axis---ability estimate.
  • Figure 5: Distribution of length of adaptive test in real learner simulations in Figure \ref{['fig:simu_level']}. X-axis---length of test; Y-axis---number of tests in each bin.
  • ...and 10 more figures