Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification
Shi Dong, Xiaobei Niu, Rui Zhong, Zhifeng Wang, Mingzhang Zuo
TL;DR
RR2QC tackles automatic multi-label question classification in education where label semantics are overlapping and long-tail labels are common. It introduces a two-stage retrieval and reranking framework with a ranking-contrastive pre-training objective and a class-center-based retrieval cue, augmented by Math LLM-generated solutions and meta-label refinement. Across four educational datasets, RR2QC achieves state-of-the-art Precision@1 and F1 scores, notably improving long-tail label recognition. The approach demonstrates the value of label semantics and meta-label decomposition for robust educational MLTC, with practical implications for automated annotation and personalized learning.
Abstract
Accurate annotation of educational resources is crucial for effective personalized learning and resource recommendation in online education. However, fine-grained knowledge labels often overlap or share similarities, making it difficult for existing multi-label classification methods to differentiate them. The label distribution imbalance due to sparsity of human annotations further intensifies these challenges. To address these issues, this paper introduces RR2QC, a novel Retrieval Reranking method to multi-label Question Classification by leveraging label semantics and meta-label refinement. First, RR2QC improves the pre-training strategy by utilizing semantic relationships within and across label groups. Second, it introduces a class center learning task to align questions with label semantics during downstream training. Finally, this method decomposes labels into meta-labels and uses a meta-label classifier to rerank the retrieved label sequences. In doing so, RR2QC enhances the understanding and prediction capability of long-tail labels by learning from meta-labels that frequently appear in other labels. Additionally, a mathematical LLM is used to generate solutions for questions, extracting latent information to further refine the model's insights. Experimental results show that RR2QC outperforms existing methods in Precision@K and F1 scores across multiple educational datasets, demonstrating its effectiveness for online education applications. The code and datasets are available at https://github.com/78Erii/RR2QC.
