MAAM: A Lightweight Multi-Agent Aggregation Module for Efficient Image Classification Based on the MindSpore Framework
Zhenkai Qin, Feng Zhu, Huan Zeng, Xunyi Nong
TL;DR
MAAM addresses the need for lightweight yet expressive image classification in resource-constrained environments by introducing three independently parameterized AgentBlocks that extract heterogeneous multi-scale features. These features are adaptively fused with learnable scalar weights via Softmax, producing a compact global representation that is then compressed with a $1\times1$ convolution and passed to a small classifier. Implemented in MindSpore, MAAM leverages dynamic graphs and operator fusion to achieve a reported 30% faster training and 2.3M parameters, while delivering state-competitive accuracy on CIFAR-10 (0.870 vs baselines like CNN 0.583, MLP 0.496, RNN 0.319). Ablation studies confirm the necessity of the Agent Attention and Reduce Layer, and MindSpore’s deployment optimizations enable efficient edge-scale inference. This work provides a practical pathway to deploy lightweight, high-performance attention mechanisms on edge devices with hardware acceleration.
Abstract
The demand for lightweight models in image classification tasks under resource-constrained environments necessitates a balance between computational efficiency and robust feature representation. Traditional attention mechanisms, despite their strong feature modeling capability, often struggle with high computational complexity and structural rigidity, limiting their applicability in scenarios with limited computational resources (e.g., edge devices or real-time systems). To address this, we propose the Multi-Agent Aggregation Module (MAAM), a lightweight attention architecture integrated with the MindSpore framework. MAAM employs three parallel agent branches with independently parameterized operations to extract heterogeneous features, adaptively fused via learnable scalar weights, and refined through a convolutional compression layer. Leveraging MindSpore's dynamic computational graph and operator fusion, MAAM achieves 87.0% accuracy on the CIFAR-10 dataset, significantly outperforming conventional CNN (58.3%) and MLP (49.6%) models, while improving training efficiency by 30%. Ablation studies confirm the critical role of agent attention (accuracy drops to 32.0% if removed) and compression modules (25.5% if omitted), validating their necessity for maintaining discriminative feature learning. The framework's hardware acceleration capabilities and minimal memory footprint further demonstrate its practicality, offering a deployable solution for image classification in resource-constrained scenarios without compromising accuracy.
