Table of Contents
Fetching ...

Leveraging Topological Guidance for Improved Knowledge Distillation

Eun Som Jeon, Rahul Khurana, Aishani Pathak, Pavan Turaga

TL;DR

This work tackles the computational bottleneck of incorporating topological data analysis into deep learning by introducing Topological Guidance-based Knowledge Distillation (TGD). TGD trains two teachers—one on raw images and one on persistence images (PI)—and distills their combined knowledge into a single compact student using both logit- and feature-based transfers, merged via similarity maps and an annealing strategy to reduce the teacher–student gap. The approach is validated on CIFAR-10 and CINIC-10, showing that TGD consistently outperforms single-teacher KD and other baselines, with enhanced robustness to noise and better calibration. Practically, TGD enables leveraging topology-inspired features for robust, edge-friendly image classification without increasing test-time complexity.

Abstract

Deep learning has shown its efficacy in extracting useful features to solve various computer vision tasks. However, when the structure of the data is complex and noisy, capturing effective information to improve performance is very difficult. To this end, topological data analysis (TDA) has been utilized to derive useful representations that can contribute to improving performance and robustness against perturbations. Despite its effectiveness, the requirements for large computational resources and significant time consumption in extracting topological features through TDA are critical problems when implementing it on small devices. To address this issue, we propose a framework called Topological Guidance-based Knowledge Distillation (TGD), which uses topological features in knowledge distillation (KD) for image classification tasks. We utilize KD to train a superior lightweight model and provide topological features with multiple teachers simultaneously. We introduce a mechanism for integrating features from different teachers and reducing the knowledge gap between teachers and the student, which aids in improving performance. We demonstrate the effectiveness of our approach through diverse empirical evaluations.

Leveraging Topological Guidance for Improved Knowledge Distillation

TL;DR

This work tackles the computational bottleneck of incorporating topological data analysis into deep learning by introducing Topological Guidance-based Knowledge Distillation (TGD). TGD trains two teachers—one on raw images and one on persistence images (PI)—and distills their combined knowledge into a single compact student using both logit- and feature-based transfers, merged via similarity maps and an annealing strategy to reduce the teacher–student gap. The approach is validated on CIFAR-10 and CINIC-10, showing that TGD consistently outperforms single-teacher KD and other baselines, with enhanced robustness to noise and better calibration. Practically, TGD enables leveraging topology-inspired features for robust, edge-friendly image classification without increasing test-time complexity.

Abstract

Deep learning has shown its efficacy in extracting useful features to solve various computer vision tasks. However, when the structure of the data is complex and noisy, capturing effective information to improve performance is very difficult. To this end, topological data analysis (TDA) has been utilized to derive useful representations that can contribute to improving performance and robustness against perturbations. Despite its effectiveness, the requirements for large computational resources and significant time consumption in extracting topological features through TDA are critical problems when implementing it on small devices. To address this issue, we propose a framework called Topological Guidance-based Knowledge Distillation (TGD), which uses topological features in knowledge distillation (KD) for image classification tasks. We utilize KD to train a superior lightweight model and provide topological features with multiple teachers simultaneously. We introduce a mechanism for integrating features from different teachers and reducing the knowledge gap between teachers and the student, which aids in improving performance. We demonstrate the effectiveness of our approach through diverse empirical evaluations.
Paper Structure (27 sections, 8 equations, 17 figures, 7 tables)

This paper contains 27 sections, 8 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: An overview of Topological Guidance based Knowledge Distillation (TGD). Two teachers are trained with different representations from the raw image and persistence image data, respectively. A student utilizes the original image data alone.
  • Figure 2: PD and its corresponding PI. Lifetime points in PD appears bright colors in PI.
  • Figure 3: An illustration of similarities for two teachers, trained with the raw image and persistence image, respectively.
  • Figure 4: Accuracy $(\%)$ of students (WRN16-1) distilled by TGD with various combinations of teachers on CIFAR-10. Teacher1 and Teacher2 consist of different (depth)-(channel) of WRN. Green, red, and magenta dashed lines denote TGD (16-3, 16-3), KD (16-3 Teacher1), and Student (WRN16-1), respectively.
  • Figure 5: Accuracy ($\%$) of students (WRN16-1) for various methods with different $\alpha$ on CIFAR-10.
  • ...and 12 more figures