PyPIM: Integrating Digital Processing-in-Memory from Microarchitectural Design to Python Tensors
Orian Leitersdorf, Ronny Ronen, Shahar Kvatinsky
TL;DR
PyPIM delivers an end-to-end, programmable stack for digital memristive PIM by tying a Python tensor API to a microarchitectural front end through a compact ISA and a flexible host driver. The approach enables seamless conversion of tensor-centric Python code into PIM-ready operations, leveraging partitioned crossbars, range-based masks, and inter-array communication to maximize parallelism. A GPU-accelerated, bit-accurate simulator validates correctness and demonstrates near-theoretical PIM throughput with modest driver overhead, while the development library and tensor-views abstractions simplify data alignment and inter-warp transfers. This work lowers the barrier to adopting PIM by providing familiar interfaces, portable abstractions, and an extensible software stack that can adapt to future digital PIM architectures. The practical impact is a more accessible, scalable path to high-throughput in-memory computing for data-intensive workloads.
Abstract
Digital processing-in-memory (PIM) architectures mitigate the memory wall problem by facilitating parallel bitwise operations directly within the memory. Recent works have demonstrated their algorithmic potential for accelerating data-intensive applications; however, there remains a significant gap in the programming model and microarchitectural design. This is further exacerbated by aspects unique to memristive PIM such as partitions and operations across both directions of the memory array. To address this gap, this paper provides an end-to-end architectural integration of digital memristive PIM from a high-level Python library for tensor operations (similar to NumPy and PyTorch) to the low-level microarchitectural design. We begin by proposing an efficient microarchitecture and instruction set architecture (ISA) that bridge the gap between the low-level control periphery and an abstraction of PIM parallelism. We subsequently propose a PIM development library that converts high-level Python to ISA instructions and a PIM driver that translates ISA instructions into PIM micro-operations. We evaluate PyPIM via a cycle-accurate simulator on a wide variety of benchmarks that both demonstrate the versatility of the Python library and the performance compared to theoretical PIM bounds. Overall, PyPIM drastically simplifies the development of PIM applications and enables the conversion of existing tensor-oriented Python programs to PIM with ease.
