Multiplicative update rules for accelerating deep learning training and increasing robustness

Manos Kirtas; Nikolaos Passalis; Anastasios Tefas

Multiplicative update rules for accelerating deep learning training and increasing robustness

Manos Kirtas, Nikolaos Passalis, Anastasios Tefas

TL;DR

The paper addresses the problem of slow training and robustness in DL arising from sensitivity to initialization and hyperparameters. It introduces GOFAU, a generic online optimization framework for alternative updates, and proposes a multiplicative update rule $\xi(m_t, l_t) = |\theta_{t-1}| \tanh(\eta_{in} m_t l_t) \eta_{out}$ along with a hybrid variant to allow sign changes. The authors demonstrate acceleration and robustness across convex and non-convex toy problems and image-classification benchmarks (CIFAR10/100, Tiny ImageNet) using standard optimizers (SGD, Adagrad, RMSProp) and architectures (ResNet, VGG). This framework provides a practical path to integrate multiplicative updates into existing training pipelines, reducing sensitivity to initialization and learning-rate choices and potentially benefiting interpretability and neuromorphic deployments.

Abstract

Even nowadays, where Deep Learning (DL) has achieved state-of-the-art performance in a wide range of research domains, accelerating training and building robust DL models remains a challenging task. To this end, generations of researchers have pursued to develop robust methods for training DL architectures that can be less sensitive to weight distributions, model architectures and loss landscapes. However, such methods are limited to adaptive learning rate optimizers, initialization schemes, and clipping gradients without investigating the fundamental rule of parameters update. Although multiplicative updates have contributed significantly to the early development of machine learning and hold strong theoretical claims, to best of our knowledge, this is the first work that investigate them in context of DL training acceleration and robustness. In this work, we propose an optimization framework that fits to a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule and we experimentally demonstrate their effectiveness in a wide range of task and optimization methods. Such tasks ranging from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures.

Multiplicative update rules for accelerating deep learning training and increasing robustness

TL;DR

along with a hybrid variant to allow sign changes. The authors demonstrate acceleration and robustness across convex and non-convex toy problems and image-classification benchmarks (CIFAR10/100, Tiny ImageNet) using standard optimizers (SGD, Adagrad, RMSProp) and architectures (ResNet, VGG). This framework provides a practical path to integrate multiplicative updates into existing training pipelines, reducing sensitivity to initialization and learning-rate choices and potentially benefiting interpretability and neuromorphic deployments.

Abstract

Paper Structure (8 sections, 9 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 8 sections, 9 equations, 4 figures, 6 tables, 1 algorithm.

Introduction
Proposed method
Generic Optimization Framework for Alternative Updates
Multiplicative Updates
Experimental Evaluation
Convex and non-convex optimization
Image classification
Conclusions

Figures (4)

Figure 1: The top figures demonstrate the optimization process for Convex 2D and Rosenbrock tasks alternative updates for SGD and Adam optimizers, respectively. The bottom figure reports the Euclidean distance between the parameters and the actual global minimum.
Figure 2: The figure depicts the score of the evaluated update rules applying SGD and Adam optimization methods in two optimization tasks. The z-axis denotes the score in reference to the initial points, given by $x_{0}$ and $x_{1}$ as depicted in x and y axis, respectively.
Figure 3: Training and validation accuracy during training applying ResNet18 architecture on CIFAR100 dataset using default configurations for both task and optimizers
Figure 4: Training and validation accuracy during training on Tiny ImageNet dataset using default configurations for both task and optimizers

Multiplicative update rules for accelerating deep learning training and increasing robustness

TL;DR

Abstract

Multiplicative update rules for accelerating deep learning training and increasing robustness

Authors

TL;DR

Abstract

Table of Contents

Figures (4)