Table of Contents
Fetching ...

Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph

Zhiwei Wang, Jun Huang, Longhua Ma, Chengyu Wu, Hongyu Ma

TL;DR

This work addresses the challenge of transferring rich knowledge from large teacher networks to compact student models in visual tasks by modeling inter-channel relationships as a Channels Relational Graph (CRG) and enriching it with spectral embedding (SE) to capture global structure. The proposed SEKD framework performs multi-level feature distillation by jointly aligning vertex, edge, and spectral-embedding information through attention-guided losses, with the objective $\mathcal{L}_{Total}=\mathcal{L}_{Origin}+\alpha\mathcal{L}_V+\beta\mathcal{L}_E+\gamma\mathcal{L}_S$. Key contributions include modeling channel interactions as a graph, applying spectral embedding to distill relational knowledge, and demonstrating consistent improvements across detectors on benchmarks such as CIFAR-100, MS-COCO, and Pascal VOC. The method offers a scalable, generalizable approach to KD in computer vision, enabling efficient deployment of high-capacity models without sacrificing performance across diverse tasks.

Abstract

In visual tasks, large teacher models capture essential features and deep information, enhancing performance. However, distilling this information into smaller student models often leads to performance loss due to structural differences and capacity limitations. To tackle this, we propose a distillation framework based on graph knowledge, including a multi-level feature alignment strategy and an attention-guided mechanism to provide a targeted learning trajectory for the student model. We emphasize spectral embedding (SE) as a key technique in our distillation process, which merges the student's feature space with the relational knowledge and structural complexities similar to the teacher network. This method captures the teacher's understanding in a graph-based representation, enabling the student model to more accurately mimic the complex structural dependencies present in the teacher model. Compared to methods that focus only on specific distillation areas, our strategy not only considers key features within the teacher model but also endeavors to capture the relationships and interactions among feature sets, encoding these complex pieces of information into a graph structure to understand and utilize the dynamic relationships among these pieces of information from a global perspective. Experiments show that our method outperforms previous feature distillation methods on the CIFAR-100, MS-COCO, and Pascal VOC datasets, proving its efficiency and applicability.

Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph

TL;DR

This work addresses the challenge of transferring rich knowledge from large teacher networks to compact student models in visual tasks by modeling inter-channel relationships as a Channels Relational Graph (CRG) and enriching it with spectral embedding (SE) to capture global structure. The proposed SEKD framework performs multi-level feature distillation by jointly aligning vertex, edge, and spectral-embedding information through attention-guided losses, with the objective . Key contributions include modeling channel interactions as a graph, applying spectral embedding to distill relational knowledge, and demonstrating consistent improvements across detectors on benchmarks such as CIFAR-100, MS-COCO, and Pascal VOC. The method offers a scalable, generalizable approach to KD in computer vision, enabling efficient deployment of high-capacity models without sacrificing performance across diverse tasks.

Abstract

In visual tasks, large teacher models capture essential features and deep information, enhancing performance. However, distilling this information into smaller student models often leads to performance loss due to structural differences and capacity limitations. To tackle this, we propose a distillation framework based on graph knowledge, including a multi-level feature alignment strategy and an attention-guided mechanism to provide a targeted learning trajectory for the student model. We emphasize spectral embedding (SE) as a key technique in our distillation process, which merges the student's feature space with the relational knowledge and structural complexities similar to the teacher network. This method captures the teacher's understanding in a graph-based representation, enabling the student model to more accurately mimic the complex structural dependencies present in the teacher model. Compared to methods that focus only on specific distillation areas, our strategy not only considers key features within the teacher model but also endeavors to capture the relationships and interactions among feature sets, encoding these complex pieces of information into a graph structure to understand and utilize the dynamic relationships among these pieces of information from a global perspective. Experiments show that our method outperforms previous feature distillation methods on the CIFAR-100, MS-COCO, and Pascal VOC datasets, proving its efficiency and applicability.
Paper Structure (20 sections, 9 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 20 sections, 9 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: The overall framework of the Multi-Level Feature Distillation. The process of generating embedding vectors in groups is shown in Figure \ref{['figure 2']}. We align the channel relational graphs of the teacher and student across multiple levels: vertices, edges, and spectral embeddings.
  • Figure 2: Illustration of generating embedding vectors group. We consider the Laplacian matrix as an algebraic representation of the graph. These embedding vectors provide a structural mapping and topological characteristics of the graph in a low-dimensional subspace.
  • Figure 3: We visualized the classification effects of classifiers using our method and the SimKD chen2022knowledge method through t-SNE dimensionality reduction. We also displayed the classification effects of the teacher, student, and untrained models. Multi-level joint alignment based on inter-channel relationships indirectly improves the classification performance between sample points by enhancing the representation of internal channel relationships in the network. We attribute this enhancement to spectral embedding technology, which boosts the student model's understanding of complex data structures, thereby increasing intra-class compactness and inter-class separability in classification.
  • Figure 4: RetinaNet R-50 AP score for various box sizes. S denotes the types of errors made by the student. D represents the types of errors after applying our method.
  • Figure 5: For each hyperparameter, we observed its impact trends on three different student networks.
  • ...and 2 more figures