Table of Contents
Fetching ...

Interpret3C: Interpretable Student Clustering Through Individualized Feature Selection

Isadora Salles, Paola Mejia-Domenzain, Vinitra Swamy, Julian Blackwell, Tanja Käser

TL;DR

Interpret3C tackles the interpretability challenge of clustering high-dimensional MOOC data by learning per-student feature masks through an interpretable neural network and then applying clustering on the masked features. The approach integrates adaptive feature gating with deep feature selection and spectral clustering to yield interpretable student groups. Evaluated on a large MOOC with 5,611 active students, the method identifies six behavioral clusters and reveals that a small subset of features drives cluster distinctions, with meaningful variation in feature importance across clusters and predictive differences in outcomes. The work demonstrates that respecting individual feature importance enhances cluster relevance and supports targeted, data-informed interventions in large-scale online courses.

Abstract

Clustering in education, particularly in large-scale online environments like MOOCs, is essential for understanding and adapting to diverse student needs. However, the effectiveness of clustering depends on its interpretability, which becomes challenging with high-dimensional data. Existing clustering approaches often neglect individual differences in feature importance and rely on a homogenized feature set. Addressing this gap, we introduce Interpret3C (Interpretable Conditional Computation Clustering), a novel clustering pipeline that incorporates interpretable neural networks (NNs) in an unsupervised learning context. This method leverages adaptive gating in NNs to select features for each student. Then, clustering is performed using the most relevant features per student, enhancing clusters' relevance and interpretability. We use Interpret3C to analyze the behavioral clusters considering individual feature importances in a MOOC with over 5,000 students. This research contributes to the field by offering a scalable, robust clustering methodology and an educational case study that respects individual student differences and improves interpretability for high-dimensional data.

Interpret3C: Interpretable Student Clustering Through Individualized Feature Selection

TL;DR

Interpret3C tackles the interpretability challenge of clustering high-dimensional MOOC data by learning per-student feature masks through an interpretable neural network and then applying clustering on the masked features. The approach integrates adaptive feature gating with deep feature selection and spectral clustering to yield interpretable student groups. Evaluated on a large MOOC with 5,611 active students, the method identifies six behavioral clusters and reveals that a small subset of features drives cluster distinctions, with meaningful variation in feature importance across clusters and predictive differences in outcomes. The work demonstrates that respecting individual feature importance enhances cluster relevance and supports targeted, data-informed interventions in large-scale online courses.

Abstract

Clustering in education, particularly in large-scale online environments like MOOCs, is essential for understanding and adapting to diverse student needs. However, the effectiveness of clustering depends on its interpretability, which becomes challenging with high-dimensional data. Existing clustering approaches often neglect individual differences in feature importance and rely on a homogenized feature set. Addressing this gap, we introduce Interpret3C (Interpretable Conditional Computation Clustering), a novel clustering pipeline that incorporates interpretable neural networks (NNs) in an unsupervised learning context. This method leverages adaptive gating in NNs to select features for each student. Then, clustering is performed using the most relevant features per student, enhancing clusters' relevance and interpretability. We use Interpret3C to analyze the behavioral clusters considering individual feature importances in a MOOC with over 5,000 students. This research contributes to the field by offering a scalable, robust clustering methodology and an educational case study that respects individual student differences and improves interpretability for high-dimensional data.
Paper Structure (7 sections, 3 figures)

This paper contains 7 sections, 3 figures.

Figures (3)

  • Figure 1: Interpret3C pipeline with deep feature selection and clustering.
  • Figure 2: For each of 8 important features (x-axis), the percentage of students (y-axis) from each cluster (color) that selected the feature as important.
  • Figure 3: For each of 8 important features (x-axis), the feature value distribution (y-axis) from each cluster (color).