Table of Contents
Fetching ...

Adaptive Knowledge Distillation for Classification of Hand Images using Explainable Vision Transformers

Thanh Thi Nguyen, Campbell Wilson, Janis Dalins

TL;DR

This work addresses cross-domain classification of hand images for forensic value using vision transformers (ViTs) and explainability tools. It demonstrates that ViTs outperform traditional hand-image methods and introduces two adaptive distillation strategies to enable learning on a new domain while preserving knowledge from a source domain without accessing source data. The authors analyze internal ViT representations with Deep Feature Factorization and Grad-CAM, and show that adaptive distillation can substantially mitigate catastrophic forgetting during domain adaptation, especially when source and target domains differ. The approach is validated on IIT Delhi and 11k Hands datasets, highlighting practical implications for access control and identity verification, while acknowledging ethical considerations and the need for responsible deployment.

Abstract

Assessing the forensic value of hand images involves the use of unique features and patterns present in an individual's hand. The human hand has distinct characteristics, such as the pattern of veins, fingerprints, and the geometry of the hand itself. This paper investigates the use of vision transformers (ViTs) for classification of hand images. We use explainability tools to explore the internal representations of ViTs and assess their impact on the model outputs. Utilizing the internal understanding of ViTs, we introduce distillation methods that allow a student model to adaptively extract knowledge from a teacher model while learning on data of a different domain to prevent catastrophic forgetting. Two publicly available hand image datasets are used to conduct a series of experiments to evaluate performance of the ViTs and our proposed adaptive distillation methods. The experimental results demonstrate that ViT models significantly outperform traditional machine learning methods and the internal states of ViTs are useful for explaining the model outputs in the classification task. By averting catastrophic forgetting, our distillation methods achieve excellent performance on data from both source and target domains, particularly when these two domains exhibit significant dissimilarity. The proposed approaches therefore can be developed and implemented effectively for real-world applications such as access control, identity verification, and authentication systems.

Adaptive Knowledge Distillation for Classification of Hand Images using Explainable Vision Transformers

TL;DR

This work addresses cross-domain classification of hand images for forensic value using vision transformers (ViTs) and explainability tools. It demonstrates that ViTs outperform traditional hand-image methods and introduces two adaptive distillation strategies to enable learning on a new domain while preserving knowledge from a source domain without accessing source data. The authors analyze internal ViT representations with Deep Feature Factorization and Grad-CAM, and show that adaptive distillation can substantially mitigate catastrophic forgetting during domain adaptation, especially when source and target domains differ. The approach is validated on IIT Delhi and 11k Hands datasets, highlighting practical implications for access control and identity verification, while acknowledging ethical considerations and the need for responsible deployment.

Abstract

Assessing the forensic value of hand images involves the use of unique features and patterns present in an individual's hand. The human hand has distinct characteristics, such as the pattern of veins, fingerprints, and the geometry of the hand itself. This paper investigates the use of vision transformers (ViTs) for classification of hand images. We use explainability tools to explore the internal representations of ViTs and assess their impact on the model outputs. Utilizing the internal understanding of ViTs, we introduce distillation methods that allow a student model to adaptively extract knowledge from a teacher model while learning on data of a different domain to prevent catastrophic forgetting. Two publicly available hand image datasets are used to conduct a series of experiments to evaluate performance of the ViTs and our proposed adaptive distillation methods. The experimental results demonstrate that ViT models significantly outperform traditional machine learning methods and the internal states of ViTs are useful for explaining the model outputs in the classification task. By averting catastrophic forgetting, our distillation methods achieve excellent performance on data from both source and target domains, particularly when these two domains exhibit significant dissimilarity. The proposed approaches therefore can be developed and implemented effectively for real-world applications such as access control, identity verification, and authentication systems.
Paper Structure (26 sections, 7 equations, 2 figures, 4 tables)

This paper contains 26 sections, 7 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Domain adaptation with knowledge distilled from a teacher. The student model at the bottom is trained on data of a new domain and tries to extract knowledge from the teacher model at the top to prevent catastrophic forgetting.
  • Figure 2: Explainability of a ViT using the PyTorch library for CAM methods.