Generalizable and Efficient Automated Scoring with a Knowledge-Distilled Multi-Task Mixture-of-Experts
Luyang Fang, Tao Wang, Ping Ma, Xiaoming Zhai
TL;DR
The paper tackles the scalability challenge of automated scoring in education by introducing UniMoE-Guided, a knowledge-distilled multi-task Mixture-of-Experts system that uses a single compact backbone to support multiple rubric-based tasks. It combines a shared encoder, a task-aware MoE layer, and lightweight task heads, and further distills knowledge from strong task-specific teachers to align student outputs with expert judgments. Across nine NGSS-aligned science tasks derived from the PASTA corpus, UniMoE-Guided achieves performance on par with or better than per-task models while reducing storage by approximately sixfold and dramatically cutting the gap to the teacher model (about 87× smaller). The approach also demonstrates robust generalization to held-out tasks and enables rapid adaptation to new rubrics by updating only a small fraction of parameters, making it practical for classroom deployment and large-scale assessments. Overall, UniMoE-Guided provides an efficient, scalable path toward reliable automated scoring that preserves rubric fidelity and supports ongoing expansion to new tasks with minimal retraining.
Abstract
Automated scoring of written constructed responses typically relies on separate models per task, straining computational resources, storage, and maintenance in real-world education settings. We propose UniMoE-Guided, a knowledge-distilled multi-task Mixture-of-Experts (MoE) approach that transfers expertise from multiple task-specific large models (teachers) into a single compact, deployable model (student). The student combines (i) a shared encoder for cross-task representations, (ii) a gated MoE block that balances shared and task-specific processing, and (iii) lightweight task heads. Trained with both ground-truth labels and teacher guidance, the student matches strong task-specific models while being far more efficient to train, store, and deploy. Beyond efficiency, the MoE layer improves transfer and generalization: experts develop reusable skills that boost cross-task performance and enable rapid adaptation to new tasks with minimal additions and tuning. On nine NGSS-aligned science-reasoning tasks (seven for training/evaluation and two held out for adaptation), UniMoE-Guided attains performance comparable to per-task models while using $\sim$6$\times$ less storage than maintaining separate students, and $87\times$ less than the 20B-parameter teacher. The method offers a practical path toward scalable, reliable, and resource-efficient automated scoring for classroom and large-scale assessment systems.
