Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization
Chris Kolb, Christian L. Müller, Bernd Bischl, David Rügamer
TL;DR
This work introduces a general, differentiable optimization transfer for explicit sparse regularization by overparametrizing targeted parameter subsets through Hadamard-based mappings. By constructing smooth surrogate penalties via a smooth variational form (SVF) and suitable parametrizations (including Hadamard products, differences, and powers), the authors prove equivalence of global and local minima between the original non-smooth problem and the smooth surrogate, thereby enabling standard gradient-based optimization without bespoke solvers. They systematically develop depth-$k$ and group-structured parametrizations, extend to non-integer depths with Hadamard powers, and address practical considerations like parameter sharing and initialization. Numerical experiments across high-dimensional regression, DNN pruning, and structured CNN sparsity demonstrate that the smooth surrogates reproduce or outperform traditional non-smooth regularizers while remaining compatible with SGD. The framework offers a versatile toolkit for integrating sparse regularization into differentiable models with broad applicability and theoretical guarantees on the preservation of minimizers.
Abstract
We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models and regularizers. In contrast, our method enables fully differentiable and approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning. The proposed optimization transfer comprises an overparameterization of selected parameters and a change of penalties. In the overparametrized problem, smooth surrogate regularization induces non-smooth, sparse regularization in the base parametrization. We prove that the surrogate objective is equivalent in the sense that it not only has identical global minima but also matching local minima, thereby avoiding the introduction of spurious solutions. Additionally, our theory establishes results of independent interest regarding matching local minima for arbitrary, potentially unregularized, objectives. We comprehensively review sparsity-inducing parametrizations across different fields that are covered by our general theory, extend their scope, and propose improvements in several aspects. Numerical experiments further demonstrate the correctness and effectiveness of our approach on several sparse learning problems ranging from high-dimensional regression to sparse neural network training.
