Futureproof Static Memory Planning
Christos Lamprakos, Panagiotis Xanthopoulos, Manolis Katsaragakis, Sotirios Xydis, Dimitrios Soudris, Francky Catthoor
TL;DR
This work reframes dynamic storage allocation as a static, NP-complete offset-assignment problem and introduces idealloc, a scalable allocator that integrates a corrected and extended Boxing Algorithm to handle million-buffer inputs. It combines theoretical refinements (latent invariants, critical-point handling) with a practical, parallelizable design, and evaluates on a new, challenging benchmark suite showing strong robustness and competitive efficiency against production solvers. The approach achieves high-quality placements with low fragmentation while maintaining fast per-iteration latency, and it provides open-source availability and a detailed discussion of future directions in large-scale memory planning. Together, these contributions offer a principled, scalable path toward future-proof static memory planning in deep learning and high-performance systems.
Abstract
The NP-complete combinatorial optimization task of assigning offsets to a set of buffers with known sizes and lifetimes so as to minimize total memory usage is called dynamic storage allocation (DSA). Existing DSA implementations bypass the theoretical state-of-the-art algorithms in favor of either fast but wasteful heuristics, or memory-efficient approaches that do not scale beyond one thousand buffers. The "AI memory wall", combined with deep neural networks' static architecture, has reignited interest in DSA. We present idealloc, a low-fragmentation, high-performance DSA implementation designed for million-buffer instances. Evaluated on a novel suite of particularly hard benchmarks from several domains, idealloc ranks first against four production implementations in terms of a joint effectiveness/robustness criterion.
