Table of Contents
Fetching ...

Temporal Memory for Resource-Constrained Agents: Continual Learning via Stochastic Compress-Add-Smooth

Michael Chertkov

Abstract

An agent that operates sequentially must incorporate new experience without forgetting old experience, under a fixed memory budget. We propose a framework in which memory is not a parameter vector but a stochastic process: a Bridge Diffusion on a replay interval $[0,1]$, whose terminal marginal encodes the present and whose intermediate marginals encode the past. New experience is incorporated via a three-step \emph{Compress--Add--Smooth} (CAS) recursion. We test the framework on the class of models with marginal probability densities modeled via Gaussian mixtures of fixed number of components~$K$ in $d$ dimensions; temporal complexity is controlled by a fixed number~$L$ of piecewise-linear protocol segments whose nodes store Gaussian-mixture states. The entire recursion costs $O(LKd^2)$ flops per day -- no backpropagation, no stored data, no neural networks -- making it viable for controller-light hardware. Forgetting in this framework arises not from parameter interference but from lossy temporal compression: the re-approximation of a finer protocol by a coarser one under a fixed segment budget. We find that the retention half-life scales linearly as $a_{1/2}\approx c\,L$ with a constant $c>1$ that depends on the dynamics but not on the mixture complexity~$K$, the dimension~$d$, or the geometry of the target family. The constant~$c$ admits an information-theoretic interpretation analogous to the Shannon channel capacity. The stochastic process underlying the bridge provides temporally coherent ``movie'' replay -- compressed narratives of the agent's history, demonstrated visually on an MNIST latent-space illustration. The framework provides a fully analytical ``Ising model'' of continual learning in which the mechanism, rate, and form of forgetting can be studied with mathematical precision.

Temporal Memory for Resource-Constrained Agents: Continual Learning via Stochastic Compress-Add-Smooth

Abstract

An agent that operates sequentially must incorporate new experience without forgetting old experience, under a fixed memory budget. We propose a framework in which memory is not a parameter vector but a stochastic process: a Bridge Diffusion on a replay interval , whose terminal marginal encodes the present and whose intermediate marginals encode the past. New experience is incorporated via a three-step \emph{Compress--Add--Smooth} (CAS) recursion. We test the framework on the class of models with marginal probability densities modeled via Gaussian mixtures of fixed number of components~ in dimensions; temporal complexity is controlled by a fixed number~ of piecewise-linear protocol segments whose nodes store Gaussian-mixture states. The entire recursion costs flops per day -- no backpropagation, no stored data, no neural networks -- making it viable for controller-light hardware. Forgetting in this framework arises not from parameter interference but from lossy temporal compression: the re-approximation of a finer protocol by a coarser one under a fixed segment budget. We find that the retention half-life scales linearly as with a constant that depends on the dynamics but not on the mixture complexity~, the dimension~, or the geometry of the target family. The constant~ admits an information-theoretic interpretation analogous to the Shannon channel capacity. The stochastic process underlying the bridge provides temporally coherent ``movie'' replay -- compressed narratives of the agent's history, demonstrated visually on an MNIST latent-space illustration. The framework provides a fully analytical ``Ising model'' of continual learning in which the mechanism, rate, and form of forgetting can be studied with mathematical precision.

Paper Structure

This paper contains 67 sections, 36 equations, 22 figures.

Figures (22)

  • Figure 1: One iteration of the compress--add--smooth recursion, illustrated for $L\!=\!4$ segments. Top: the protocol at day $n$ consists of $L$ uniform segments on $[0,1]$. Middle: compression rescales the protocol to $[0,L/(L+1)]$; the new day is appended on $[L/(L+1)]$, producing $L\!+\!1$ uniform segments. Bottom: rebinning averages the $L\!+\!1$ segments back onto the $L$-segment grid. The right-hand labels indicate the information-theoretic role of each step: only smoothing is lossy. Dashed lines track a past-day readout time $t_{m|n}$, which contracts by factor $L/(L+1)$ every day.
  • Figure 2: Single-Gaussian experiment ($K=1$, $d=2$, $L=10$) under the default circular-drift setting. (a) Age-averaged normalized forgetting $\bar{F}(a)$, showing retention half-life $a_{1/2}=30$. The curve exhibits a low-error plateau for ages $0$--$15$, followed by a steep sigmoid transition crossing $\bar{F} = 0.5$ at age 30, a slight overshoot to $\bar{F} \approx 1.08$ around age 50 (the confusion regime), and eventual saturation near $\bar{F} = 1.0$. The curve is weakly non-monotone due to the periodic geometry of the circular drift, which causes geometric recurrence at multiples of the half-period. (b) Full forgetting matrix $\bar{F}(m,n)$ as a function of recalled day $m$ and current day $n$. The dominant trend is age-controlled forgetting (iso-forgetting contours run parallel to the diagonal), modulated by the periodic geometry of the underlying drift.
  • Figure 3: Default single-Gaussian circular-drift experiment ($L=10$). (a) Original daily means (dots, coloured by day) and replayed means at the final day (crosses). The black star marks the prior mean (the origin), which serves as the protocol's long-time attractor. Confusion is visible as the systematic inward displacement of crosses from the circle toward the star: recent memories (warm colours) are replayed near their true locations on the circle, while older memories (cool colours) are pulled progressively toward the prior mean. This convergence of replay means toward the star --- rather than remaining on the circle --- is the geometric signature of confusion. (b) Readout times $t_{m|n}$ versus current day $n$, showing the geometric decay \ref{['eq:readout_geometric']}.
  • Figure 4: Selected replay ellipses in the default single-Gaussian circular-drift experiment ($L=10$). For each displayed day, the original target mean is shown by a dot, while the replayed mean is shown by a cross with a dashed ellipse representing the replay covariance. Recent memories (e.g. day 95) are replayed with small displacement and compact ellipses. Intermediate-age memories (e.g. day 35) show large displacement toward the origin and inflated covariances. Very old memories (e.g. day 25) collapse nearly to the origin with very large ellipses. Geometric aliasing is visible for day 5, whose true location on the circle lies close to day 95 (they differ by two full periods), producing an apparently accurate replay that is coincidental rather than genuine recall.
  • Figure 5: Segment-budget sweep for the single-Gaussian circular-drift experiment. (a) Age--forgetting curves for $L \in \{5, 8, 10, 15, 20, 30\}$. Increasing $L$ shifts the sigmoid transition to higher ages without changing the curve shape qualitatively. (b) Retention half-life $a_{1/2}$ versus $L$, confirming approximate linear scaling.
  • ...and 17 more figures

Theorems & Definitions (2)

  • Remark 1: Temporal blurring of node states
  • Remark 2: Confusion: $\bar{F} > 1$