Table of Contents
Fetching ...

Chained computerized adaptive testing for the Force Concept Inventory

Jun-ichiro Yasuda, Michael M. Hull, Naohiro Mae, Kentaro Kojima

TL;DR

This paper presents Chain-CAT, a chained computerized adaptive testing approach that uses collateral information from prior test administrations to repeatedly assess student understanding of Newtonian mechanics via the Force Concept Inventory. Through numerical simulations, it demonstrates that collateral information can dramatically improve test efficiency, potentially reducing the total item burden below the conventional 60-item pre‑post scheme (e.g., 45 items with L=5 across nine administrations) without sacrificing accuracy or precision. The study also highlights practical constraints: a 30-item FCI item bank, without balancing or exposure controls, can achieve competitive performance, but incorporating content balancing and exposure limits diminishes efficiency unless the item bank is expanded with highly discriminative items from the same or other inventories. Overall, Chain-CAT offers a promising formative assessment tool for tracking conceptual change over a course, contingent on expanding item banks and validating the approach in real classroom settings.

Abstract

Although conceptual assessment tests are commonly administered at the beginning and end of a semester, this pre-post approach has inherent limitations. Specifically, education researchers and instructors have limited ability to observe the progression of student conceptual understanding throughout the course. Furthermore, instructors are limited in the usefulness of the feedback they can give to the students involved. To address these challenges, we propose an alternative approach that leverages computerized adaptive testing (CAT) and increasing the frequency of CAT-based assessments during the course, while reducing the test length per administration, thus keeping or decreasing the total number of test items administered throughout the course. The feasibility of this idea depends on how far the test length per administration can be reduced without compromising the test accuracy and precision. Specifically, the overall test length is desired to be shorter than when the full assessment is administered as a pretest and subsequent post-test. To achieve this goal, we developed a CAT algorithm that we call Chain-CAT. This algorithm sequentially links the results of each CAT administration using collateral information. We developed the Chain-CAT algorithm using the items of the Force Concept Inventory (FCI) and analyzed the efficiency by numerical simulations. We found that collateral information significantly improved the test efficiency, and the overall test length could be shorter than the pre-post method. Without constraints for item balancing and exposure control, simulation results indicated that the efficiency of Chain-CAT is comparable to that of the pre-post method even if the length of each CAT administration is only 5 items and the CAT is administered 9 times throughout the semester. (To continue, see text.)

Chained computerized adaptive testing for the Force Concept Inventory

TL;DR

This paper presents Chain-CAT, a chained computerized adaptive testing approach that uses collateral information from prior test administrations to repeatedly assess student understanding of Newtonian mechanics via the Force Concept Inventory. Through numerical simulations, it demonstrates that collateral information can dramatically improve test efficiency, potentially reducing the total item burden below the conventional 60-item pre‑post scheme (e.g., 45 items with L=5 across nine administrations) without sacrificing accuracy or precision. The study also highlights practical constraints: a 30-item FCI item bank, without balancing or exposure controls, can achieve competitive performance, but incorporating content balancing and exposure limits diminishes efficiency unless the item bank is expanded with highly discriminative items from the same or other inventories. Overall, Chain-CAT offers a promising formative assessment tool for tracking conceptual change over a course, contingent on expanding item banks and validating the approach in real classroom settings.

Abstract

Although conceptual assessment tests are commonly administered at the beginning and end of a semester, this pre-post approach has inherent limitations. Specifically, education researchers and instructors have limited ability to observe the progression of student conceptual understanding throughout the course. Furthermore, instructors are limited in the usefulness of the feedback they can give to the students involved. To address these challenges, we propose an alternative approach that leverages computerized adaptive testing (CAT) and increasing the frequency of CAT-based assessments during the course, while reducing the test length per administration, thus keeping or decreasing the total number of test items administered throughout the course. The feasibility of this idea depends on how far the test length per administration can be reduced without compromising the test accuracy and precision. Specifically, the overall test length is desired to be shorter than when the full assessment is administered as a pretest and subsequent post-test. To achieve this goal, we developed a CAT algorithm that we call Chain-CAT. This algorithm sequentially links the results of each CAT administration using collateral information. We developed the Chain-CAT algorithm using the items of the Force Concept Inventory (FCI) and analyzed the efficiency by numerical simulations. We found that collateral information significantly improved the test efficiency, and the overall test length could be shorter than the pre-post method. Without constraints for item balancing and exposure control, simulation results indicated that the efficiency of Chain-CAT is comparable to that of the pre-post method even if the length of each CAT administration is only 5 items and the CAT is administered 9 times throughout the semester. (To continue, see text.)

Paper Structure

This paper contains 19 sections, 6 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Illustration of Chain-CAT algorithm for a student taking the tests for a course with seven administrations of the Chain-CAT.
  • Figure 2: Typical numerical simulation results for a linear progression model. The test length of each administration is set to $L=10$. The standard deviation of the prior distribution was fixed at $\sigma^i=0.5$. (a) Left: without collateral information (basic CAT). (b) Right: with collateral information (Chain-CAT). The values of "pre" and "pst" show the results based on the pre-post method where the full FCI (30 items) is used in the proficiency estimation without collateral information.
  • Figure 3: Histograms of the number of times an item was administered in the nine tests when $L=5$ in the Linear model ($0\leq \theta \leq 1$). If an item appears in all nine tests, its frequency contributes one count to the bin labeled "9" in the histogram representing the number of times administrated. The left (right) graph shows the result without (with) content balancing and item exposure control. The sum of the products of each $x$ value (number of times administrated) and its corresponding $y$ value (frequency) equals the total number of item administrations in the simulation, which is 450 000.
  • Figure 4: Histogram of the likelihood of an item to be administered on a given test when there are nine tests of $L=5$ in the Linear model ($0\leq \theta \leq1$). The red lines show the discrimination parameters for the FCI items. The graphs are divided by subgroup of the FCI: Kinematics (KN), First law (FL), Second law (SL), Third law (TL), and Kinds of forces (KF). The top (bottom) graph shows the result without (with) content balancing and item exposure control.