Approximation of the Proximal Operator of the $\ell_\infty$ Norm Using a Neural Network

Kathryn Linehan; Radu Balan

Approximation of the Proximal Operator of the $\ell_\infty$ Norm Using a Neural Network

Kathryn Linehan, Radu Balan

TL;DR

This work tackles the computation of prox_{α||·||∞}(x), a proximal operator traditionally requiring sorting, by deriving an exact O(m log m) algorithm linked to the Moreau decomposition and introducing an O(m) neural-network approximation. The NN relies on a moments-based feature preprocessing that makes it agnostic to vector length, enabling a single model to handle inputs of varying sizes and to outperform a naïve vanilla network. Empirical results show the NN achieves accurate τ predictions and fast proximal computations, with performance advantages especially for uniform data and longer vectors. The approach offers a practical, scalable alternative for integrating proximal steps into large-scale optimization problems while preserving theoretical ties to exact methods.

Abstract

Computing the proximal operator of the $\ell_\infty$ norm, $\textbf{prox}_{α||\cdot||_\infty}(\mathbf{x})$, generally requires a sort of the input data, or at least a partial sort similar to quicksort. In order to avoid using a sort, we present an $O(m)$ approximation of $\textbf{prox}_{α||\cdot||_\infty}(\mathbf{x})$ using a neural network. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process that uses moments of the input data. We present results on the accuracy of the approximation, feature importance, and computational efficiency of the approach. We show that the network outperforms a "vanilla neural network" that does not use feature selection. We also present an algorithm with corresponding theory to calculate $\textbf{prox}_{α||\cdot||_\infty}(\mathbf{x})$ exactly, relate it to the Moreau decomposition, and compare its computational efficiency to that of the approximation.

Approximation of the Proximal Operator of the $\ell_\infty$ Norm Using a Neural Network

TL;DR

Abstract

Computing the proximal operator of the

norm,

, generally requires a sort of the input data, or at least a partial sort similar to quicksort. In order to avoid using a sort, we present an

approximation of

using a neural network. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process that uses moments of the input data. We present results on the accuracy of the approximation, feature importance, and computational efficiency of the approach. We show that the network outperforms a "vanilla neural network" that does not use feature selection. We also present an algorithm with corresponding theory to calculate

exactly, relate it to the Moreau decomposition, and compare its computational efficiency to that of the approximation.

Paper Structure (18 sections, 12 theorems, 40 equations, 2 figures, 6 tables, 5 algorithms)

This paper contains 18 sections, 12 theorems, 40 equations, 2 figures, 6 tables, 5 algorithms.

Introduction
Related Work
Computing $\textbf{prox}_{\alpha ||\cdot||_\infty}$
Computation by Neural Network
Data Preprocessing and Feature Selection
Numerical Experiments
Learning Curves and Comparison with "Vanilla Neural Network"
Proximal Operator Error
Feature Importance
Computational Efficiency
Conclusion
Proof of Theorem \ref{['thm:psi']}
Divide and Conquer Algorithm
Proof of Theorem \ref{['thm:same_t']}
Proof of Theorem \ref{['thm:tau_linop']}
...and 3 more sections

Key Result

Theorem 3.1

For any $\alpha \in \mathbb{R}$ and $\mathbf{x} \in \mathbb{R}^m$, let $I_t = \{k \:| \: |\mathbf{x}_k| \geq t\}$, $\psi: \mathbb{R}^+ \cup \{0\} \rightarrow \mathbb{R}$ be defined as and $\mathbf{s}$ be a permutation of $|\mathbf{x}|$ such that $\mathbf{s}_1 \geq \mathbf{s}_2 \geq \cdots \geq \mathbf{s}_m \geq 0$ and $\mathbf{s}_{m+1} = 0.$ Then, $\psi$ is $\mathcal{C}^1$ over $t > 0$ and strict

Figures (2)

Figure 1: Feature importances for the best models from experiments 1-6
Figure 4: Feature importances for the best models from experiments 3-4, D1-D4, and 1-2

Theorems & Definitions (24)

Theorem 3.1
proof
Theorem 3.2
proof
Theorem 4.1
proof
Theorem 4.2
proof
Theorem 4.3
proof
...and 14 more

Approximation of the Proximal Operator of the $\ell_\infty$ Norm Using a Neural Network

TL;DR

Abstract

Approximation of the Proximal Operator of the $\ell_\infty$ Norm Using a Neural Network

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (24)