Coordinate Descent for Network Linearization
Vlad Rakhlin, Amir Jevnisek, Shai Avidan
TL;DR
This work tackles the discrete optimization problem of reducing ReLU activations to enable private inference with neural networks. It introduces a Block Coordinate Descent algorithm that directly operates on a binary ReLU mask, removing ReLUs iteratively and finetuning as needed to maintain accuracy. The method yields sparse networks with provable runtime-performance behavior and demonstrates state-of-the-art accuracy across ResNet18 and Wide-ResNet-22-8 on CIFAR-10, CIFAR-100, and TinyImageNet, often outperforming existing selective approaches and even enabling AutoRep-style performance at reduced budgets. By offering a drop-in discrete optimization that complements existing PI pipelines, it has practical implications for latency and bandwidth efficiency in privacy-preserving inference.
Abstract
ReLU activations are the main bottleneck in Private Inference that is based on ResNet networks. This is because they incur significant inference latency. Reducing ReLU count is a discrete optimization problem, and there are two common ways to approach it. Most current state-of-the-art methods are based on a smooth approximation that jointly optimizes network accuracy and ReLU budget at once. However, the last hard thresholding step of the optimization usually introduces a large performance loss. We take an alternative approach that works directly in the discrete domain by leveraging Coordinate Descent as our optimization framework. In contrast to previous methods, this yields a sparse solution by design. We demonstrate, through extensive experiments, that our method is State of the Art on common benchmarks.
