PrivacyCD: Hierarchical Unlearning for Protecting Student Privacy in Cognitive Diagnosis
Mingliang Hou, Yinuo Wang, Teng Guo, Zitao Liu, Wenzhou Dou, Jiaqi Zheng, Renqiang Luo, Mi Tian, Weiqi Luo
TL;DR
The paper tackles the privacy challenge of removing individual student data from cognitive diagnosis (CD) models by introducing PrivacyCD, a framework that combines a privacy-preserving neural CD architecture with Hierarchical Importance-guided Forgetting (HIF). HIF smooths parameter importance estimates by combining fine-grained (per-parameter) signals with coarse layer-wise signals via a smoothing factor $\beta$, and then selectively attenuates parameters based on an adjusted importance $I_{adj}$ controlled by $\alpha$ and $\lambda$. The authors formalize the unlearning problem for CD, present a decoupled neural CD architecture that localizes personal information in student embeddings, and provide a James-Stein-like theoretical analysis showing reduced estimation error for the hierarchical approach. Extensive experiments on Math1, Math2, and FrcSub demonstrate that PrivacyCD achieves state-of-the-art balance among unlearning efficacy (MIA AUC/ACC), model utility (AUC/ACC), and efficiency (RTRR), across multiple unlearning ratios, with qualitative analyses validating the forgetting behavior. The work enables practical, privacy-preserving deployment of CD systems in education and informs design principles for effective unlearning in heterogeneous neural architectures.
Abstract
The need to remove specific student data from cognitive diagnosis (CD) models has become a pressing requirement, driven by users' growing assertion of their "right to be forgotten". However, existing CD models are largely designed without privacy considerations and lack effective data unlearning mechanisms. Directly applying general purpose unlearning algorithms is suboptimal, as they struggle to balance unlearning completeness, model utility, and efficiency when confronted with the unique heterogeneous structure of CD models. To address this, our paper presents the first systematic study of the data unlearning problem for CD models, proposing a novel and efficient algorithm: hierarchical importanceguided forgetting (HIF). Our key insight is that parameter importance in CD models exhibits distinct layer wise characteristics. HIF leverages this via an innovative smoothing mechanism that combines individual and layer, level importance, enabling a more precise distinction of parameters associated with the data to be unlearned. Experiments on three real world datasets show that HIF significantly outperforms baselines on key metrics, offering the first effective solution for CD models to respond to user data removal requests and for deploying high-performance, privacy preserving AI systems
