Table of Contents
Fetching ...

Learning Coarse-to-Fine Pruning of Graph Convolutional Networks for Skeleton-based Recognition

Hichem Sahbi

TL;DR

The paper tackles the efficiency of graph convolutional networks for skeleton-based recognition by proposing a coarse-to-fine (CTF) pruning framework that factorizes weight masks as $W^\ell = \hat{W}^\ell \odot \psi(\hat{W}^\ell)$, where $\psi$ combines a fine-grained mask via a symmetrized shifted sigmoid and a coarse-grained mask that prunes along rows, columns, and blocks. A budgeted variational objective $\mathcal{L}_e( \{ \psi(\hat{W}^\ell) \odot \hat{W}^\ell \}_\ell ) + \lambda (\sum_{\ell=1}^{L-1} \psi(\hat{W}^\ell) - c)^2$ guides pruning toward a target cost $c$, with an annealed $\sigma$ yielding crisp masks. Experiments on SBU and FPHA demonstrate that CTF achieves strong speedups at high pruning rates while maintaining or improving accuracy relative to structured, unstructured, and regularization-based baselines, highlighting a practical path to lightweight, effective skeleton-based recognition models.

Abstract

Magnitude Pruning is a staple lightweight network design method which seeks to remove connections with the smallest magnitude. This process is either achieved in a structured or unstructured manner. While structured pruning allows reaching high efficiency, unstructured one is more flexible and leads to better accuracy, but this is achieved at the expense of low computational performance. In this paper, we devise a novel coarse-to-fine (CTF) method that gathers the advantages of structured and unstructured pruning while discarding their inconveniences to some extent. Our method relies on a novel CTF parametrization that models the mask of each connection as the Hadamard product involving four parametrizations which capture channel-wise, column-wise, row-wise and entry-wise pruning respectively. Hence, fine-grained pruning is enabled only when the coarse-grained one is disabled, and this leads to highly efficient networks while being effective. Extensive experiments conducted on the challenging task of skeleton-based recognition, using the standard SBU and FPHA datasets, show the clear advantage of our CTF approach against different baselines as well as the related work.

Learning Coarse-to-Fine Pruning of Graph Convolutional Networks for Skeleton-based Recognition

TL;DR

The paper tackles the efficiency of graph convolutional networks for skeleton-based recognition by proposing a coarse-to-fine (CTF) pruning framework that factorizes weight masks as , where combines a fine-grained mask via a symmetrized shifted sigmoid and a coarse-grained mask that prunes along rows, columns, and blocks. A budgeted variational objective guides pruning toward a target cost , with an annealed yielding crisp masks. Experiments on SBU and FPHA demonstrate that CTF achieves strong speedups at high pruning rates while maintaining or improving accuracy relative to structured, unstructured, and regularization-based baselines, highlighting a practical path to lightweight, effective skeleton-based recognition models.

Abstract

Magnitude Pruning is a staple lightweight network design method which seeks to remove connections with the smallest magnitude. This process is either achieved in a structured or unstructured manner. While structured pruning allows reaching high efficiency, unstructured one is more flexible and leads to better accuracy, but this is achieved at the expense of low computational performance. In this paper, we devise a novel coarse-to-fine (CTF) method that gathers the advantages of structured and unstructured pruning while discarding their inconveniences to some extent. Our method relies on a novel CTF parametrization that models the mask of each connection as the Hadamard product involving four parametrizations which capture channel-wise, column-wise, row-wise and entry-wise pruning respectively. Hence, fine-grained pruning is enabled only when the coarse-grained one is disabled, and this leads to highly efficient networks while being effective. Extensive experiments conducted on the challenging task of skeleton-based recognition, using the standard SBU and FPHA datasets, show the clear advantage of our CTF approach against different baselines as well as the related work.

Paper Structure

This paper contains 8 sections, 4 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: This figure shows the CTF pruning process in Eq. \ref{['eq55556']}; here each diagonal block corresponds to a channel.
  • Figure 2: This figure shows a crop of the mask tensor of the second (multi-head-attention) layer of our GCNs when trained on the FPHA dataset. Top row corresponds to the original mask (without pruning) while the second and the third rows correspond to masks obtained with structured and unstructured pruning respectively (with increasing pruning rates; from left-to-right equal to 90%, 95% and 98% respectively). The final row corresponds to masks obtained with semi-structured pruning (with again increasing pruning rates; from left-to-right, equal to 90%, 95% and 98% respectively). In all these masks, each diagonal block corresponds to a channel. Better to zoom the PDF.