Learning Coarse-to-Fine Pruning of Graph Convolutional Networks for Skeleton-based Recognition
Hichem Sahbi
TL;DR
The paper tackles the efficiency of graph convolutional networks for skeleton-based recognition by proposing a coarse-to-fine (CTF) pruning framework that factorizes weight masks as $W^\ell = \hat{W}^\ell \odot \psi(\hat{W}^\ell)$, where $\psi$ combines a fine-grained mask via a symmetrized shifted sigmoid and a coarse-grained mask that prunes along rows, columns, and blocks. A budgeted variational objective $\mathcal{L}_e( \{ \psi(\hat{W}^\ell) \odot \hat{W}^\ell \}_\ell ) + \lambda (\sum_{\ell=1}^{L-1} \psi(\hat{W}^\ell) - c)^2$ guides pruning toward a target cost $c$, with an annealed $\sigma$ yielding crisp masks. Experiments on SBU and FPHA demonstrate that CTF achieves strong speedups at high pruning rates while maintaining or improving accuracy relative to structured, unstructured, and regularization-based baselines, highlighting a practical path to lightweight, effective skeleton-based recognition models.
Abstract
Magnitude Pruning is a staple lightweight network design method which seeks to remove connections with the smallest magnitude. This process is either achieved in a structured or unstructured manner. While structured pruning allows reaching high efficiency, unstructured one is more flexible and leads to better accuracy, but this is achieved at the expense of low computational performance. In this paper, we devise a novel coarse-to-fine (CTF) method that gathers the advantages of structured and unstructured pruning while discarding their inconveniences to some extent. Our method relies on a novel CTF parametrization that models the mask of each connection as the Hadamard product involving four parametrizations which capture channel-wise, column-wise, row-wise and entry-wise pruning respectively. Hence, fine-grained pruning is enabled only when the coarse-grained one is disabled, and this leads to highly efficient networks while being effective. Extensive experiments conducted on the challenging task of skeleton-based recognition, using the standard SBU and FPHA datasets, show the clear advantage of our CTF approach against different baselines as well as the related work.
