DNCs Require More Planning Steps
Yara Shamshoum, Nitzan Hodos, Yuval Sieradzki, Assaf Schuster
TL;DR
This work investigates how computational time and external memory constraints influence the generalization of differentiable algorithmic solvers, using the Differentiable Neural Computer (DNC) as a testbed. It introduces a planning-budget framework with $p(n)$ and memory-size considerations, revealing that a fixed small budget hinders generalization, while adaptive planning and memory strategies improve performance across tasks such as Shortest Path, MinCut, Convex Hull, and Associative Recall. Key contributions include memory-extension techniques coupled with temperature-based reweighting, adaptive memory $m(n)$, and a stochastic planning budget during training, all supported by empirical results showing phase transitions in A_n(p) and improved stability. The findings provide general guidelines for designing resource-aware algorithmic solvers, with potential implications for more advanced models and LLMs that must balance time and memory in real-world settings.
Abstract
Many recent works use machine learning models to solve various complex algorithmic problems. However, these models attempt to reach a solution without considering the problem's required computational complexity, which can be detrimental to their ability to solve it correctly. In this work we investigate the effect of computational time and memory on generalization of implicit algorithmic solvers. To do so, we focus on the Differentiable Neural Computer (DNC), a general problem solver that also lets us reason directly about its usage of time and memory. In this work, we argue that the number of planning steps the model is allowed to take, which we call "planning budget", is a constraint that can cause the model to generalize poorly and hurt its ability to fully utilize its external memory. We evaluate our method on Graph Shortest Path, Convex Hull, Graph MinCut and Associative Recall, and show how the planning budget can drastically change the behavior of the learned algorithm, in terms of learned time complexity, training time, stability and generalization to inputs larger than those seen during training.
