Table of Contents
Fetching ...

Exploiting Subgradient Sparsity in Max-Plus Neural Networks

Ikhlas Enaieh, Olivier Fercoq

TL;DR

This work proposes a sparse subgradient algorithm that explicitly exploits the algebraic sparsity of the Max-Plus neural architecture, and tailoring the optimization procedure to the non-smooth nature of Max-Plus models achieves more efficient updates while retaining theoretical guarantees.

Abstract

Deep Neural Networks are powerful tools for solving machine learning problems, but their training often involves dense and costly parameter updates. In this work, we use a novel Max-Plus neural architecture in which classical addition and multiplication are replaced with maximum and summation operations respectively. This is a promising architecture in terms of interpretability, but its training is challenging. A particular feature is that this algebraic structure naturally induces sparsity in the subgradients, as only neurons that contribute to the maximum affect the loss. However, standard backpropagation fails to exploit this sparsity, leading to unnecessary computations. In this work, we focus on the minimization of the worst sample loss which transfers this sparsity to the optimization loss. To address this, we propose a sparse subgradient algorithm that explicitly exploits the algebraic sparsity. By tailoring the optimization procedure to the non-smooth nature of Max-Plus models, our method achieves more efficient updates while retaining theoretical guarantees. This highlights a principled path toward bridging algebraic structure and scalable learning.

Exploiting Subgradient Sparsity in Max-Plus Neural Networks

TL;DR

This work proposes a sparse subgradient algorithm that explicitly exploits the algebraic sparsity of the Max-Plus neural architecture, and tailoring the optimization procedure to the non-smooth nature of Max-Plus models achieves more efficient updates while retaining theoretical guarantees.

Abstract

Deep Neural Networks are powerful tools for solving machine learning problems, but their training often involves dense and costly parameter updates. In this work, we use a novel Max-Plus neural architecture in which classical addition and multiplication are replaced with maximum and summation operations respectively. This is a promising architecture in terms of interpretability, but its training is challenging. A particular feature is that this algebraic structure naturally induces sparsity in the subgradients, as only neurons that contribute to the maximum affect the loss. However, standard backpropagation fails to exploit this sparsity, leading to unnecessary computations. In this work, we focus on the minimization of the worst sample loss which transfers this sparsity to the optimization loss. To address this, we propose a sparse subgradient algorithm that explicitly exploits the algebraic sparsity. By tailoring the optimization procedure to the non-smooth nature of Max-Plus models, our method achieves more efficient updates while retaining theoretical guarantees. This highlights a principled path toward bridging algebraic structure and scalable learning.
Paper Structure (33 sections, 2 theorems, 58 equations, 3 figures, 7 tables)

This paper contains 33 sections, 2 theorems, 58 equations, 3 figures, 7 tables.

Key Result

Proposition 2.2

If the maximum Sparse Categorical Cross-Entropy loss is strictly less than $\log 2$, then the model achieves $100\%$ classification accuracy on the training set.

Figures (3)

  • Figure 1: Final Max-SCCE values on the IRIS dataset obtained with three different weight initialization strategies, evaluated over 10 independent runs.
  • Figure 2: Convergence of the LMM model on MNIST
  • Figure 3: Distribution of predicted probabilities for the true labels on MNIST using the LMM model.

Theorems & Definitions (8)

  • Definition 2.1: Morphological Perceptron morphological_activation
  • Proposition 2.2: Perfect classification under a max-SCCE threshold
  • Definition 2.3: Short Computational Tree (SCT) nesterov_subgradient
  • Theorem 3.2: Sparsity of the Subgradient
  • proof
  • proof
  • proof
  • proof