Positive concave deep equilibrium models
Mateusz Gabor, Tomasz Piotrowski, Renato L. G. Cavalcante
TL;DR
This work addresses instability and lack of formal guarantees in deep equilibrium (DEQ) models by introducing positive concave deep equilibrium (pcDEQ) layers. pcDEQ leverages nonlinear Perron-Frobenius theory and standard interference mappings, enforcing nonnegative weights and activations concave on the positive orthant to guarantee a unique fixed point and geometric convergence of the fixed-point iteration $z_{k+1}=g_x(z_k)$, while preserving standard backprop training. The theoretical contributions include existence/uniqueness guarantees and geometric convergence, and empirically pcDEQ achieves competitive accuracy with fewer parameters than NODE, ANODE, and monDEQ on MNIST, SVHN, and CIFAR-10. This approach offers a simpler, principled route to stable implicit models and highlights the potential of SI mappings as a versatile tool for deep learning models that rely on fixed-point computations.
Abstract
Deep equilibrium (DEQ) models are widely recognized as a memory efficient alternative to standard neural networks, achieving state-of-the-art performance in language modeling and computer vision tasks. These models solve a fixed point equation instead of explicitly computing the output, which sets them apart from standard neural networks. However, existing DEQ models often lack formal guarantees of the existence and uniqueness of the fixed point, and the convergence of the numerical scheme used for computing the fixed point is not formally established. As a result, DEQ models are potentially unstable in practice. To address these drawbacks, we introduce a novel class of DEQ models called positive concave deep equilibrium (pcDEQ) models. Our approach, which is based on nonlinear Perron-Frobenius theory, enforces nonnegative weights and activation functions that are concave on the positive orthant. By imposing these constraints, we can easily ensure the existence and uniqueness of the fixed point without relying on additional complex assumptions commonly found in the DEQ literature, such as those based on monotone operator theory in convex analysis. Furthermore, the fixed point can be computed with the standard fixed point algorithm, and we provide theoretical guarantees of its geometric convergence, which, in particular, simplifies the training process. Experiments demonstrate the competitiveness of our pcDEQ models against other implicit models.
