A Proper Orthogonal Decomposition approach for parameters reduction of Single Shot Detector networks
Laura Meneghetti, Nicola Demo, Gianluigi Rozza
TL;DR
The paper tackles the challenge of deploying high-accuracy object detectors like SSD300 on resource-constrained devices by introducing a POD-based dimensionality reduction layer. It splits the base network at a chosen cut-off layer $l$, projects the high-dimensional pre-model activations $\mathbf{x}^{(l)}$ onto a low-dimensional POD subspace via $\mathbf{z}^i=\\mathbf{\\Psi}_r^T\mathbf{x}^{(l,i)}$, and connects this reduced representation to the original predictor, with priors reduced from $8732$ to $5782$. Empirical results show that the reduced network achieves substantial gains in memory and training time (e.g., memory down ~15–22% and training time halved) but incurs notable accuracy loss (e.g., $mAP$ drops from $77.8\%$ to $39\%$ on VOC and from $70.2\%$ to $59\%$ on a cat-dog subset). The work highlights a practical trade-off between compression and detection performance and suggests avenues like hyperreduction and automatic cutoff-layer selection to further improve efficiency in real-world deployments.
Abstract
As a major breakthrough in artificial intelligence and deep learning, Convolutional Neural Networks have achieved an impressive success in solving many problems in several fields including computer vision and image processing. Real-time performance, robustness of algorithms and fast training processes remain open problems in these contexts. In addition object recognition and detection are challenging tasks for resource-constrained embedded systems, commonly used in the industrial sector. To overcome these issues, we propose a dimensionality reduction framework based on Proper Orthogonal Decomposition, a classical model order reduction technique, in order to gain a reduction in the number of hyperparameters of the net. We have applied such framework to SSD300 architecture using PASCAL VOC dataset, demonstrating a reduction of the network dimension and a remarkable speedup in the fine-tuning of the network in a transfer learning context.
