Spatially Optimized Compact Deep Metric Learning Model for Similarity Search
Md. Farhadul Islam, Md. Tanzim Reza, Meem Arafat Manab, Mohammad Rakibul Hasan Mahin, Sarah Zabeen, Jannatun Noor
TL;DR
The paper addresses efficient metric learning for similarity search by exploiting spatial relationships using a lightweight model that fuses a single involution layer with a compact convolutional backbone. The involution layer generates per-pixel dynamic kernels, and GELU activations are used, trained with cross-entropy and Multi-Similarity losses. The proposed Hybrid-1 configuration achieves competitive performance with far fewer parameters and model size (~0.116M parameters; ~0.457 MB) than deeper CNNs on MNIST, FashionMNIST, and CIFAR-10. This work demonstrates that spatially adaptive kernels plus compact architectures can improve distance-based embedding quality while enabling real-world deployment, with potential applications in retrieval and local feature matching.
Abstract
Spatial optimization is often overlooked in many computer vision tasks. Filters should be able to recognize the features of an object regardless of where it is in the image. Similarity search is a crucial task where spatial features decide an important output. The capacity of convolution to capture visual patterns across various locations is limited. In contrast to convolution, the involution kernel is dynamically created at each pixel based on the pixel value and parameters that have been learned. This study demonstrates that utilizing a single layer of involution feature extractor alongside a compact convolution model significantly enhances the performance of similarity search. Additionally, we improve predictions by using the GELU activation function rather than the ReLU. The negligible amount of weight parameters in involution with a compact model with better performance makes the model very useful in real-world implementations. Our proposed model is below 1 megabyte in size. We have experimented with our proposed methodology and other models on CIFAR-10, FashionMNIST, and MNIST datasets. Our proposed method outperforms across all three datasets.
