Table of Contents
Fetching ...

Positive concave deep equilibrium models

Mateusz Gabor, Tomasz Piotrowski, Renato L. G. Cavalcante

TL;DR

This work addresses instability and lack of formal guarantees in deep equilibrium (DEQ) models by introducing positive concave deep equilibrium (pcDEQ) layers. pcDEQ leverages nonlinear Perron-Frobenius theory and standard interference mappings, enforcing nonnegative weights and activations concave on the positive orthant to guarantee a unique fixed point and geometric convergence of the fixed-point iteration $z_{k+1}=g_x(z_k)$, while preserving standard backprop training. The theoretical contributions include existence/uniqueness guarantees and geometric convergence, and empirically pcDEQ achieves competitive accuracy with fewer parameters than NODE, ANODE, and monDEQ on MNIST, SVHN, and CIFAR-10. This approach offers a simpler, principled route to stable implicit models and highlights the potential of SI mappings as a versatile tool for deep learning models that rely on fixed-point computations.

Abstract

Deep equilibrium (DEQ) models are widely recognized as a memory efficient alternative to standard neural networks, achieving state-of-the-art performance in language modeling and computer vision tasks. These models solve a fixed point equation instead of explicitly computing the output, which sets them apart from standard neural networks. However, existing DEQ models often lack formal guarantees of the existence and uniqueness of the fixed point, and the convergence of the numerical scheme used for computing the fixed point is not formally established. As a result, DEQ models are potentially unstable in practice. To address these drawbacks, we introduce a novel class of DEQ models called positive concave deep equilibrium (pcDEQ) models. Our approach, which is based on nonlinear Perron-Frobenius theory, enforces nonnegative weights and activation functions that are concave on the positive orthant. By imposing these constraints, we can easily ensure the existence and uniqueness of the fixed point without relying on additional complex assumptions commonly found in the DEQ literature, such as those based on monotone operator theory in convex analysis. Furthermore, the fixed point can be computed with the standard fixed point algorithm, and we provide theoretical guarantees of its geometric convergence, which, in particular, simplifies the training process. Experiments demonstrate the competitiveness of our pcDEQ models against other implicit models.

Positive concave deep equilibrium models

TL;DR

This work addresses instability and lack of formal guarantees in deep equilibrium (DEQ) models by introducing positive concave deep equilibrium (pcDEQ) layers. pcDEQ leverages nonlinear Perron-Frobenius theory and standard interference mappings, enforcing nonnegative weights and activations concave on the positive orthant to guarantee a unique fixed point and geometric convergence of the fixed-point iteration , while preserving standard backprop training. The theoretical contributions include existence/uniqueness guarantees and geometric convergence, and empirically pcDEQ achieves competitive accuracy with fewer parameters than NODE, ANODE, and monDEQ on MNIST, SVHN, and CIFAR-10. This approach offers a simpler, principled route to stable implicit models and highlights the potential of SI mappings as a versatile tool for deep learning models that rely on fixed-point computations.

Abstract

Deep equilibrium (DEQ) models are widely recognized as a memory efficient alternative to standard neural networks, achieving state-of-the-art performance in language modeling and computer vision tasks. These models solve a fixed point equation instead of explicitly computing the output, which sets them apart from standard neural networks. However, existing DEQ models often lack formal guarantees of the existence and uniqueness of the fixed point, and the convergence of the numerical scheme used for computing the fixed point is not formally established. As a result, DEQ models are potentially unstable in practice. To address these drawbacks, we introduce a novel class of DEQ models called positive concave deep equilibrium (pcDEQ) models. Our approach, which is based on nonlinear Perron-Frobenius theory, enforces nonnegative weights and activation functions that are concave on the positive orthant. By imposing these constraints, we can easily ensure the existence and uniqueness of the fixed point without relying on additional complex assumptions commonly found in the DEQ literature, such as those based on monotone operator theory in convex analysis. Furthermore, the fixed point can be computed with the standard fixed point algorithm, and we provide theoretical guarantees of its geometric convergence, which, in particular, simplifies the training process. Experiments demonstrate the competitiveness of our pcDEQ models against other implicit models.
Paper Structure (23 sections, 8 theorems, 17 equations, 10 figures, 7 tables)

This paper contains 23 sections, 8 theorems, 17 equations, 10 figures, 7 tables.

Key Result

Lemma 4.2

Consider a DEQ layer $g_x\colon\mathbb{R}^{n}_+\to\operatorname{int}(\mathbb{R}^{n}_+)$ of the form in (deq_eq) in Definition deq_layer for an input $x$. Then:

Figures (10)

  • Figure 1: The visualization of the possible construction of pcDEQ layers. The symbols shown in the figures mean: $\mathbb{R}^{n \times n}_+\ni W_+$ - nonnegative weights, $\mathbb{R}^{n}_+\ni z_+$ - nonnegative vector of fixed point iteration, $\mathbb{R}^{n}_+\ni x_{+}$ - nonnegative input to the layer, $\textup{int}(\mathbb{R}^{n}_+)\ni x_{++}$ - positive input to the layer, $\sigma_{NC}$ - nonnegative concave activation function (List 1 in Remark \ref{['activ']}) and $\sigma_{PC}$ - positive concave activation function (List 2 in Remark \ref{['activ']}).
  • Figure 2: Test accuracies during training for the pcDEQ model with a single convolutional layer over five experiment runs.
  • Figure 3: Number of fixed point iterations for computing the fixed point in forward and backward passes for the pcDEQ model with a single convolutional layer over five experiment runs.
  • Figure 4: Largest singular value of pcDEQ linear layer over five experiment runs.
  • Figure 5: Architectures of pcDEQ models with single linear pcDEQ layer. Subfigure (a) presents architecture with pcDEQ-1 layer and (b) with pcDEQ-2 layer.
  • ...and 5 more figures

Theorems & Definitions (17)

  • Definition 3.1
  • Definition 3.2
  • Remark 3.3
  • Definition 3.4
  • Remark 4.1
  • Lemma 4.2
  • Definition 4.3
  • Proposition 4.4
  • Corollary 4.5
  • Proposition 4.6
  • ...and 7 more