From GPUs to RRAMs: Distributed In-Memory Primal-Dual Hybrid Gradient Method for Solving Large-Scale Linear Optimization Problem
Huynh Q. N. Vo, Md Tawsif Rahman Chowdhury, Paritosh Ramanan, Gozde Tutuncuoglu, Junchi Yang, Feng Qiu, Murat Yildirim
TL;DR
This work addresses the challenge of solving large-scale constrained LPs with energy- and time-efficient hardware by co-designing a distributed in-memory PDHG method for RRAM crossbar arrays. By encoding the constraint matrix into a symmetric block matrix and leveraging MELISO+ for physics-based simulation, the approach minimizes write operations, remains robust to device non-idealities, and achieves convergence under analog noise. The results show orders-of-magnitude improvements in energy and latency over GPU baselines while maintaining comparable accuracy, highlighting the practical impact of algorithm-hardware co-design for scalable in-memory optimization. The framework paves the way for scalable IMC solvers across distributed crossbars and broader optimization tasks, supported by theoretical guarantees for inexact hardware updates.
Abstract
The exponential growth of computational workloads is surpassing the capabilities of conventional architectures, which are constrained by fundamental limits. In-memory computing (IMC) with RRAM provides a promising alternative by providing analog computations with significant gains in latency and energy use. However, existing algorithms developed for conventional architectures do not translate to IMC, particularly for constrained optimization problems where frequent matrix reprogramming remains cost-prohibitive for IMC applications. Here we present a distributed in-memory primal-dual hybrid gradient (PDHG) method, specifically co-designed for arrays of RRAM devices. Our approach minimizes costly write cycles, incorporates robustness against device non-idealities, and leverages a symmetric block-matrix formulation to unify operations across distributed crossbars. We integrate a physics-based simulation framework called MELISO+ to evaluate performance under realistic device conditions. Benchmarking against GPU-accelerated solvers on large-scale linear programs demonstrates that our RRAM-based solver achieves comparable accuracy with up to three orders of magnitude reductions in energy consumption and latency. These results demonstrate the first PDHG-based LP solver implemented on RRAMs, showcasing the transformative potential of algorithm-hardware co-design for solving large-scale optimization through distributed in-memory computing.
