Table of Contents
Fetching ...

SalNAS: Efficient Saliency-prediction Neural Architecture Search with self-knowledge distillation

Chakkrit Termritthikun, Ayaz Umer, Suwichaya Suwanwimolkul, Feng Xia, Ivan Lee

TL;DR

SalNAS introduces a weight-sharing neural architecture search framework for saliency prediction by embedding dynamic convolution into a joint encoder-decoder supernet. It adds Self-KD, a teacherless distillation that uses an averaged, cross-validated best subnet as the teacher to improve generalization without gradient cost. Empirically, SalNAS-XL with Self-KD achieves state-of-the-art performance across seven benchmark datasets with about 20.98M parameters and demonstrates favorable real-time metrics. The work provides an end-to-end NAS+distillation pipeline for efficient, scalable saliency prediction suitable for edge devices, with code released.

Abstract

Recent advancements in deep convolutional neural networks have significantly improved the performance of saliency prediction. However, the manual configuration of the neural network architectures requires domain knowledge expertise and can still be time-consuming and error-prone. To solve this, we propose a new Neural Architecture Search (NAS) framework for saliency prediction with two contributions. Firstly, a supernet for saliency prediction is built with a weight-sharing network containing all candidate architectures, by integrating a dynamic convolution into the encoder-decoder in the supernet, termed SalNAS. Secondly, despite the fact that SalNAS is highly efficient (20.98 million parameters), it can suffer from the lack of generalization. To solve this, we propose a self-knowledge distillation approach, termed Self-KD, that trains the student SalNAS with the weighted average information between the ground truth and the prediction from the teacher model. The teacher model, while sharing the same architecture, contains the best-performing weights chosen by cross-validation. Self-KD can generalize well without the need to compute the gradient in the teacher model, enabling an efficient training system. By utilizing Self-KD, SalNAS outperforms other state-of-the-art saliency prediction models in most evaluation rubrics across seven benchmark datasets while being a lightweight model. The code will be available at https://github.com/chakkritte/SalNAS

SalNAS: Efficient Saliency-prediction Neural Architecture Search with self-knowledge distillation

TL;DR

SalNAS introduces a weight-sharing neural architecture search framework for saliency prediction by embedding dynamic convolution into a joint encoder-decoder supernet. It adds Self-KD, a teacherless distillation that uses an averaged, cross-validated best subnet as the teacher to improve generalization without gradient cost. Empirically, SalNAS-XL with Self-KD achieves state-of-the-art performance across seven benchmark datasets with about 20.98M parameters and demonstrates favorable real-time metrics. The work provides an end-to-end NAS+distillation pipeline for efficient, scalable saliency prediction suitable for edge devices, with code released.

Abstract

Recent advancements in deep convolutional neural networks have significantly improved the performance of saliency prediction. However, the manual configuration of the neural network architectures requires domain knowledge expertise and can still be time-consuming and error-prone. To solve this, we propose a new Neural Architecture Search (NAS) framework for saliency prediction with two contributions. Firstly, a supernet for saliency prediction is built with a weight-sharing network containing all candidate architectures, by integrating a dynamic convolution into the encoder-decoder in the supernet, termed SalNAS. Secondly, despite the fact that SalNAS is highly efficient (20.98 million parameters), it can suffer from the lack of generalization. To solve this, we propose a self-knowledge distillation approach, termed Self-KD, that trains the student SalNAS with the weighted average information between the ground truth and the prediction from the teacher model. The teacher model, while sharing the same architecture, contains the best-performing weights chosen by cross-validation. Self-KD can generalize well without the need to compute the gradient in the teacher model, enabling an efficient training system. By utilizing Self-KD, SalNAS outperforms other state-of-the-art saliency prediction models in most evaluation rubrics across seven benchmark datasets while being a lightweight model. The code will be available at https://github.com/chakkritte/SalNAS
Paper Structure (22 sections, 8 equations, 5 figures, 9 tables, 1 algorithm)

This paper contains 22 sections, 8 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: Our proposed architecture consists of the encoder and decoder modules that employ dynamic convolutional layers.
  • Figure 2: The proposed Self-Knowledge Distillation method.
  • Figure 3: (\ref{['fig:image1']}) shows the correlation coefficient for the smallest subnet (SalNAS-XS), showing superior performance with the Self-KD method. (\ref{['fig:image2']}) shows the correlation coefficient for the largest subnet (SalNAS-XL) shows the Self-KD method's superiority over inplace distillation and the sandwich rule methods.
  • Figure 4: Qualitative comparison between our model (SalNAS-XL) and other models, including TranSalNet, EfficientNet-B4, and TResNet-M. Images are sourced from the SALICON validation dataset. Rows 1-3 showcase the saliency maps produced by the SalNAS-XL subnet, closely matching ground truth. Rows 4-6 of SalNAS-XL manifest lower cc scores compared to other models.
  • Figure 5: Exploring the search space of SalNAS: a bubble chart comparing computational complexity (GFLOPS) and correlation coefficient for a sample of 2000 subnets. Bubble size indicates parameter count, and the chart displays various model designs, ranging from SalNAS-XS to SalNAS-XL.