Table of Contents
Fetching ...

Multi-Method Analysis of Mathematics Placement Assessments: Classical, Machine Learning, and Clustering Approaches

Julian D. Allagan, Dasia A. Singleton, Shanae N. Perry, Gabrielle C. Morgan, Essence A. Morgan

TL;DR

The paper tackles the high-stakes problem of mathematics placement by applying a convergent multi-method framework that combines Classical Test Theory, machine learning, and clustering to a 40-item exam administered to $n=198$ students. It finds a binary competency boundary near $42.5\%$ that diverges from the institutional threshold, and shows that ensemble ML methods (notably Random Forest) achieve near-perfect predictive accuracy while relying on a small core of highly discriminating items (most notably Question $6$). The study proposes practical refinements including replacing 12 poorly discriminating items, adopting a two-stage placement process, and using ML predictions with transparency to support decision-making. Collectively, the work demonstrates how integrating univariate psychometrics, multivariate predictive modeling, and data-driven clustering yields robust, actionable guidance for evidence-based mathematics placement optimization in STEM education.

Abstract

This study evaluates a 40-item mathematics placement examination administered to 198 students using a multi-method framework combining Classical Test Theory, machine learning, and unsupervised clustering. Classical Test Theory analysis reveals that 55\% of items achieve excellent discrimination ($D \geq 0.40$) while 30\% demonstrate poor discrimination ($D < 0.20$) requiring replacement. Question 6 (Graph Interpretation) emerges as the examination's most powerful discriminator, achieving perfect discrimination ($D = 1.000$), highest ANOVA F-statistic ($F = 4609.1$), and maximum Random Forest feature importance (0.206), accounting for 20.6\% of predictive power. Machine learning algorithms demonstrate exceptional performance, with Random Forest and Gradient Boosting achieving 97.5\% and 96.0\% cross-validation accuracy. K-means clustering identifies a natural binary competency structure with a boundary at 42.5\%, diverging from the institutional threshold of 55\% and suggesting potential overclassification into remedial categories. The two-cluster solution exhibits exceptional stability (bootstrap ARI = 0.855) with perfect lower-cluster purity. Convergent evidence across methods supports specific refinements: replace poorly discriminating items, implement a two-stage assessment, and integrate Random Forest predictions with transparency mechanisms. These findings demonstrate that multi-method integration provides a robust empirical foundation for evidence-based mathematics placement optimization.

Multi-Method Analysis of Mathematics Placement Assessments: Classical, Machine Learning, and Clustering Approaches

TL;DR

The paper tackles the high-stakes problem of mathematics placement by applying a convergent multi-method framework that combines Classical Test Theory, machine learning, and clustering to a 40-item exam administered to students. It finds a binary competency boundary near that diverges from the institutional threshold, and shows that ensemble ML methods (notably Random Forest) achieve near-perfect predictive accuracy while relying on a small core of highly discriminating items (most notably Question ). The study proposes practical refinements including replacing 12 poorly discriminating items, adopting a two-stage placement process, and using ML predictions with transparency to support decision-making. Collectively, the work demonstrates how integrating univariate psychometrics, multivariate predictive modeling, and data-driven clustering yields robust, actionable guidance for evidence-based mathematics placement optimization in STEM education.

Abstract

This study evaluates a 40-item mathematics placement examination administered to 198 students using a multi-method framework combining Classical Test Theory, machine learning, and unsupervised clustering. Classical Test Theory analysis reveals that 55\% of items achieve excellent discrimination () while 30\% demonstrate poor discrimination () requiring replacement. Question 6 (Graph Interpretation) emerges as the examination's most powerful discriminator, achieving perfect discrimination (), highest ANOVA F-statistic (), and maximum Random Forest feature importance (0.206), accounting for 20.6\% of predictive power. Machine learning algorithms demonstrate exceptional performance, with Random Forest and Gradient Boosting achieving 97.5\% and 96.0\% cross-validation accuracy. K-means clustering identifies a natural binary competency structure with a boundary at 42.5\%, diverging from the institutional threshold of 55\% and suggesting potential overclassification into remedial categories. The two-cluster solution exhibits exceptional stability (bootstrap ARI = 0.855) with perfect lower-cluster purity. Convergent evidence across methods supports specific refinements: replace poorly discriminating items, implement a two-stage assessment, and integrate Random Forest predictions with transparency mechanisms. These findings demonstrate that multi-method integration provides a robust empirical foundation for evidence-based mathematics placement optimization.

Paper Structure

This paper contains 21 sections, 20 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Distribution of mathematics placement test scores for 198 students.
  • Figure 2: Relationship between item difficulty ($p$) and discrimination index ($D$) across 40 examination items. Color coding indicates quality classification: excellent (green, $D \geq 0.40$), good (blue, $0.30 \leq D < 0.40$), marginal (orange, $0.20 \leq D < 0.30$), and poor (red, $D < 0.20$). Horizontal reference lines mark discrimination thresholds. The plot demonstrates that items with moderate difficulty ($0.30 < p < 0.70$) achieve higher discrimination, while items with extreme difficulty exhibit uniformly poor discrimination.
  • Figure 3: Top 15 items ranked by Random Forest feature importance, demonstrating the dominance of Question 6 (20.6% importance) and the exponential decay in importance for subsequent items. The distribution suggests that a small subset of highly discriminating items accounts for the majority of placement prediction accuracy.
  • Figure 4: Distribution of mathematics placement test scores showing the two-cluster solution. Cluster 0 (Low Performance, red) contains 84 students with mean 26.0%, while Cluster 1 (High Performance, blue) contains 114 students with mean 61.3%. The vertical dashed line at 42.5% marks the natural clustering boundary, while institutional boundaries at 55% and 70% are shown for comparison. The clean separation between clusters contrasts with the substantial overlap between institutional Precalculus and Calculus I categories within Cluster 1.