Table of Contents
Fetching ...

Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification

Shi Dong, Xiaobei Niu, Rui Zhong, Zhifeng Wang, Mingzhang Zuo

TL;DR

RR2QC tackles automatic multi-label question classification in education where label semantics are overlapping and long-tail labels are common. It introduces a two-stage retrieval and reranking framework with a ranking-contrastive pre-training objective and a class-center-based retrieval cue, augmented by Math LLM-generated solutions and meta-label refinement. Across four educational datasets, RR2QC achieves state-of-the-art Precision@1 and F1 scores, notably improving long-tail label recognition. The approach demonstrates the value of label semantics and meta-label decomposition for robust educational MLTC, with practical implications for automated annotation and personalized learning.

Abstract

Accurate annotation of educational resources is crucial for effective personalized learning and resource recommendation in online education. However, fine-grained knowledge labels often overlap or share similarities, making it difficult for existing multi-label classification methods to differentiate them. The label distribution imbalance due to sparsity of human annotations further intensifies these challenges. To address these issues, this paper introduces RR2QC, a novel Retrieval Reranking method to multi-label Question Classification by leveraging label semantics and meta-label refinement. First, RR2QC improves the pre-training strategy by utilizing semantic relationships within and across label groups. Second, it introduces a class center learning task to align questions with label semantics during downstream training. Finally, this method decomposes labels into meta-labels and uses a meta-label classifier to rerank the retrieved label sequences. In doing so, RR2QC enhances the understanding and prediction capability of long-tail labels by learning from meta-labels that frequently appear in other labels. Additionally, a mathematical LLM is used to generate solutions for questions, extracting latent information to further refine the model's insights. Experimental results show that RR2QC outperforms existing methods in Precision@K and F1 scores across multiple educational datasets, demonstrating its effectiveness for online education applications. The code and datasets are available at https://github.com/78Erii/RR2QC.

Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification

TL;DR

RR2QC tackles automatic multi-label question classification in education where label semantics are overlapping and long-tail labels are common. It introduces a two-stage retrieval and reranking framework with a ranking-contrastive pre-training objective and a class-center-based retrieval cue, augmented by Math LLM-generated solutions and meta-label refinement. Across four educational datasets, RR2QC achieves state-of-the-art Precision@1 and F1 scores, notably improving long-tail label recognition. The approach demonstrates the value of label semantics and meta-label decomposition for robust educational MLTC, with practical implications for automated annotation and personalized learning.

Abstract

Accurate annotation of educational resources is crucial for effective personalized learning and resource recommendation in online education. However, fine-grained knowledge labels often overlap or share similarities, making it difficult for existing multi-label classification methods to differentiate them. The label distribution imbalance due to sparsity of human annotations further intensifies these challenges. To address these issues, this paper introduces RR2QC, a novel Retrieval Reranking method to multi-label Question Classification by leveraging label semantics and meta-label refinement. First, RR2QC improves the pre-training strategy by utilizing semantic relationships within and across label groups. Second, it introduces a class center learning task to align questions with label semantics during downstream training. Finally, this method decomposes labels into meta-labels and uses a meta-label classifier to rerank the retrieved label sequences. In doing so, RR2QC enhances the understanding and prediction capability of long-tail labels by learning from meta-labels that frequently appear in other labels. Additionally, a mathematical LLM is used to generate solutions for questions, extracting latent information to further refine the model's insights. Experimental results show that RR2QC outperforms existing methods in Precision@K and F1 scores across multiple educational datasets, demonstrating its effectiveness for online education applications. The code and datasets are available at https://github.com/78Erii/RR2QC.

Paper Structure

This paper contains 26 sections, 19 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: A four-level knowledge hierarchy tree and its meta-labels (bottom). As shown in the figure, $Label_A$ and $Label_B$ share the meta-labels "linear equations" and "linear functions," which easily mislead the classifier's decisions.
  • Figure 2: A question paired with a solution generated by the Math LLM. The blue phrases represent meta-labels.
  • Figure 3: The general process of Ranking Contrastive Pre-training. The ranking contrastive learning between the left and the middle follows the MoCo framework, while the label encoder on the right is used to train well-distributed label vectors to initialize the class centers in downstream task and simultaneously to weight the closeness between $e_t$ and $e_p$.
  • Figure 4: The overall process of Retrieval Reranking, where the retrieval model generates initial label predictions, which the reranking model refines using meta-labels. Final scores are calculated by combining retrieval label scores with weighted meta-label scores to enhance ranking performance.
  • Figure 5: Results of various components of RR2QC combined with Vanilla BERT model on Math Junior dataset.
  • ...and 3 more figures