Table of Contents
Fetching ...

Dynamic Memory Based Adaptive Optimization

Balázs Szegedy, Domonkos Czifra, Péter Kőrösi-Szabó

TL;DR

This work addresses the limitation of fixed-memory optimizers by introducing Retrospective Learning Law Correction (RLLC), a framework that uses a learnable learning-law vector L ∈ R^k updated as L ← L + c2 M^+ g to produce adaptive parameter updates θ ← θ − c1 M L. By combining linear memory updates with RLLC, the authors derive LM-RLLC optimizers whose memory units can be organized into Jordan-block propagators, including real, complex, and multi-block structures, enabling a flexible interpolation between SGD, momentum SGD, and NAG. The paper provides theoretical results such as basis invariance and a real Jordan normal form for the memory updates, and demonstrates in experiments on MNIST, Fashion-MNIST, and CIFAR-10 that RLLC-based optimizers often outperform standard baselines across several architectures. The findings suggest that adding memory and adapting it via retrospective corrections can yield significant performance gains, with potential implications for broader optimization strategies and future work on adaptive memory rules and larger-scale tasks.

Abstract

Define an optimizer as having memory $k$ if it stores $k$ dynamically changing vectors in the parameter space. Classical SGD has memory $0$, momentum SGD optimizer has $1$ and Adam optimizer has $2$. We address the following questions: How can optimizers make use of more memory units? What information should be stored in them? How to use them for the learning steps? As an approach to the last question, we introduce a general method called "Retrospective Learning Law Correction" or shortly RLLC. This method is designed to calculate a dynamically varying linear combination (called learning law) of memory units, which themselves may evolve arbitrarily. We demonstrate RLLC on optimizers whose memory units have linear update rules and small memory ($\leq 4$ memory units). Our experiments show that in a variety of standard problems, these optimizers outperform the above mentioned three classical optimizers. We conclude that RLLC is a promising framework for boosting the performance of known optimizers by adding more memory units and by making them more adaptive.

Dynamic Memory Based Adaptive Optimization

TL;DR

This work addresses the limitation of fixed-memory optimizers by introducing Retrospective Learning Law Correction (RLLC), a framework that uses a learnable learning-law vector L ∈ R^k updated as L ← L + c2 M^+ g to produce adaptive parameter updates θ ← θ − c1 M L. By combining linear memory updates with RLLC, the authors derive LM-RLLC optimizers whose memory units can be organized into Jordan-block propagators, including real, complex, and multi-block structures, enabling a flexible interpolation between SGD, momentum SGD, and NAG. The paper provides theoretical results such as basis invariance and a real Jordan normal form for the memory updates, and demonstrates in experiments on MNIST, Fashion-MNIST, and CIFAR-10 that RLLC-based optimizers often outperform standard baselines across several architectures. The findings suggest that adding memory and adapting it via retrospective corrections can yield significant performance gains, with potential implications for broader optimization strategies and future work on adaptive memory rules and larger-scale tasks.

Abstract

Define an optimizer as having memory if it stores dynamically changing vectors in the parameter space. Classical SGD has memory , momentum SGD optimizer has and Adam optimizer has . We address the following questions: How can optimizers make use of more memory units? What information should be stored in them? How to use them for the learning steps? As an approach to the last question, we introduce a general method called "Retrospective Learning Law Correction" or shortly RLLC. This method is designed to calculate a dynamically varying linear combination (called learning law) of memory units, which themselves may evolve arbitrarily. We demonstrate RLLC on optimizers whose memory units have linear update rules and small memory ( memory units). Our experiments show that in a variety of standard problems, these optimizers outperform the above mentioned three classical optimizers. We conclude that RLLC is a promising framework for boosting the performance of known optimizers by adding more memory units and by making them more adaptive.
Paper Structure (17 sections, 4 theorems, 29 equations, 5 figures, 1 table)

This paper contains 17 sections, 4 theorems, 29 equations, 5 figures, 1 table.

Key Result

Lemma 3.8

Let $U$ be a memory update rule as above and $Q\in\mathbb{R}^{k\times k}$ be an arbitrary matrix. Then the RLLC optimizer corresponding to $U$ is essentially equivalent to the RLLC optimizer corresponding to $U^Q$.

Figures (5)

  • Figure 1: Test accuracy graphs of RLLC and benchmark optimizers, measured on the CIFAR-10 dataset, with the ResNet-20 network. RLLC optimizers show faster convergence and better generalization. See related plots and error bars in \ref{['appendix:Supplementary_plots']}.
  • Figure 2: Analysis of $M(0.9) \oplus M(0.0)$ optimizer's memory unit's coefficients over time. The figure illustrates the optimizer's transition between momentum SGD and SGD, briefly aligning with the NAG optimizer around the 2k step.
  • Figure 3: Analysis of the coefficients of $M(0.9) \oplus M(0.8) \oplus M(0.7)$ optimizer over time. The figure shows an interesting negative coupling between $M(0.8)$ and $M(0.7)$. See further details in \ref{['appendix:additional_math_obs']}.
  • Figure 4: Test accuracy graph of some RLLC optimizer, comparing with benchmark optimizers. On most of the tasks RLLC optimizers perform better, than the benchmark optimizers.
  • Figure 5: Test accuracy graph of some RLLC optimizer, with min-max interval, trained from $3$ random seed initialization. The accuracy does not vary a lot, suggesting, that RLLC is robust.

Theorems & Definitions (14)

  • Definition 3.1
  • Definition 3.2: RLLC functional form
  • Remark 3.3
  • Remark 3.4
  • Remark 3.5
  • Definition 3.6
  • Definition 3.7
  • Lemma 3.8: Linear invariance of RLLC
  • proof
  • Lemma 4.1
  • ...and 4 more