Table of Contents
Fetching ...

Hybrid of DiffStride and Spectral Pooling in Convolutional Neural Networks

Sulthan Rafif, Mochamad Arfan Ravy Wahyu Pratama, Mohammad Faris Azhar, Ahmad Mustafidul Ibad, Lailil Muflikhah, Novanto Yudistira

TL;DR

The paper tackles information loss in CNN downsampling caused by fixed stride by proposing a Hybrid method that merges DiffStride, which learns the stride via backpropagation, with Spectral Pooling, which crops the frequency-domain representation. Implemented on a ResNet-18 backbone, the approach places DiffStride after convolutions in residual blocks and Spectral Pooling before Global Average Pooling, with learning-rate tuning to ensure convergence. Empirical results on CIFAR-10 show a mean accuracy of $0.9334$ for the Hybrid method versus $0.924$ for the DiffStride baseline, and CIFAR-100 shows $0.7382$ versus $0.706$, indicating consistent gains from spectral pooling and learnable stride. The findings suggest that frequency-domain downsampling combined with learnable stride can preserve information more effectively per parameter, potentially improving efficiency and accuracy in CNNs, with future work aimed at further improvements and broader evaluation.

Abstract

Stride determines the distance between adjacent filter positions as the filter moves across the input. A fixed stride causes important information contained in the image can not be captured, so that important information is not classified. Therefore, in previous research, the DiffStride Method was applied, namely the Strided Convolution Method with which it can learn its own stride value. Severe Quantization and a constraining lower bound on preserved information are arises with Max Pooling Downsampling Method. Spectral Pooling reduce the constraint lower bound on preserved information by cutting off the representation in the frequency domain. In this research a CNN Model is proposed with the Downsampling Learnable Stride Technique performed by Backpropagation combined with the Spectral Pooling Technique. Diffstride and Spectral Pooling techniques are expected to maintain most of the information contained in the image. In this study, we compare the Hybrid Method, which is a combined implementation of Spectral Pooling and DiffStride against the Baseline Method, which is the DiffStride implementation on ResNet 18. The accuracy result of the DiffStride combination with Spectral Pooling improves over DiffStride which is baseline method by 0.0094. This shows that the Hybrid Method can maintain most of the information by cutting of the representation in the frequency domain and determine the stride of the learning result through Backpropagation.

Hybrid of DiffStride and Spectral Pooling in Convolutional Neural Networks

TL;DR

The paper tackles information loss in CNN downsampling caused by fixed stride by proposing a Hybrid method that merges DiffStride, which learns the stride via backpropagation, with Spectral Pooling, which crops the frequency-domain representation. Implemented on a ResNet-18 backbone, the approach places DiffStride after convolutions in residual blocks and Spectral Pooling before Global Average Pooling, with learning-rate tuning to ensure convergence. Empirical results on CIFAR-10 show a mean accuracy of for the Hybrid method versus for the DiffStride baseline, and CIFAR-100 shows versus , indicating consistent gains from spectral pooling and learnable stride. The findings suggest that frequency-domain downsampling combined with learnable stride can preserve information more effectively per parameter, potentially improving efficiency and accuracy in CNNs, with future work aimed at further improvements and broader evaluation.

Abstract

Stride determines the distance between adjacent filter positions as the filter moves across the input. A fixed stride causes important information contained in the image can not be captured, so that important information is not classified. Therefore, in previous research, the DiffStride Method was applied, namely the Strided Convolution Method with which it can learn its own stride value. Severe Quantization and a constraining lower bound on preserved information are arises with Max Pooling Downsampling Method. Spectral Pooling reduce the constraint lower bound on preserved information by cutting off the representation in the frequency domain. In this research a CNN Model is proposed with the Downsampling Learnable Stride Technique performed by Backpropagation combined with the Spectral Pooling Technique. Diffstride and Spectral Pooling techniques are expected to maintain most of the information contained in the image. In this study, we compare the Hybrid Method, which is a combined implementation of Spectral Pooling and DiffStride against the Baseline Method, which is the DiffStride implementation on ResNet 18. The accuracy result of the DiffStride combination with Spectral Pooling improves over DiffStride which is baseline method by 0.0094. This shows that the Hybrid Method can maintain most of the information by cutting of the representation in the frequency domain and determine the stride of the learning result through Backpropagation.
Paper Structure (16 sections, 6 figures, 2 tables)

This paper contains 16 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Hybrid Spectral Pooling and DiffStride Architectures
  • Figure 2: Implementation DiffStride in Residual Layer
  • Figure 3: Architecture Difference Between Hybrid and Baseline Method
  • Figure 4: Accuracy and Loss for DiffStride (Baseline Method)
  • Figure 5: Accuracy and Loss for DiffStride Spectral Pooling (Hybrid Method)
  • ...and 1 more figures