MHAFF: Multi-Head Attention Feature Fusion of CNN and Transformer for Cattle Identification
Rabin Dulal, Lihong Zheng, Muhammad Ashad Kabir
TL;DR
This paper tackles muzzle-based cattle identification and the limited ability of CNNs to capture long-range dependencies. It introduces MHAFF, a dual-branch fusion framework that combines local CNN features and global transformer features through a multi-head attention mechanism, yielding a fused representation that preserves original information while modeling inter-feature relationships. Empirical results across CIFAR10, Flower102, and two cattle datasets show that MHAFF outperforms addition and concatenation fusion, as well as multiple baselines and state-of-the-art methods, achieving up to 99.88% accuracy on Cattle-1 and 99.52% on Cattle-2 with rapid convergence. The work demonstrates strong generalization and offers a practical, high-accuracy approach for livestock identification, with Grad-CAM visualizations supporting interpretable attention to discriminative muzzle features.
Abstract
Convolutional Neural Networks (CNNs) have drawn researchers' attention to identifying cattle using muzzle images. However, CNNs often fail to capture long-range dependencies within the complex patterns of the muzzle. The transformers handle these challenges. This inspired us to fuse the strengths of CNNs and transformers in muzzle-based cattle identification. Addition and concatenation have been the most commonly used techniques for feature fusion. However, addition fails to preserve discriminative information, while concatenation results in an increase in dimensionality. Both methods are simple operations and cannot discover the relationships or interactions between fusing features. This research aims to overcome the issues faced by addition and concatenation. This research introduces a novel approach called Multi-Head Attention Feature Fusion (MHAFF) for the first time in cattle identification. MHAFF captures relations between the different types of fusing features while preserving their originality. The experiments show that MHAFF outperformed addition and concatenation techniques and the existing cattle identification methods in accuracy on two publicly available cattle datasets. MHAFF demonstrates excellent performance and quickly converges to achieve optimum accuracy of 99.88% and 99.52% in two cattle datasets simultaneously.
