Power-Area Efficient Serial IMPLY-based 4:2 Compressor Applied in Data-Intensive Applications
Bahareh Bagheralmoosavi, Seyed Erfan Fatemieh, Mohammad Reza Reshadinezhad, Antonio Rubio
TL;DR
The paper tackles the memory bandwidth bottleneck in data-intensive computing by proposing a serial memristive IMPLY-based 4:2 compressor tailored for crossbar-compatible in-memory computing. It introduces a NAND-based IMPLY implementation that uses 7 memristors and 44 computational steps, achieving significant area and energy savings. By integrating this compressor into 4×4 and 8×8 multipliers, the authors merge partial-product generation with carry-propagation steps, enabling substantial reductions in memristor count, latency, and energy compared with XOR/MUX-based designs. Validation via SPICE with a VTEAM memristor model demonstrates concrete improvements (e.g., up to $36\\%$ fewer memristors and $12\%$ energy reductions for larger multipliers), underscoring practical impact for PIM-based arithmetic in data-intensive applications. The work advances crossbar-friendly IMPLY-based arithmetic, providing a scalable path for energy-efficient, memory-centric computing systems.
Abstract
The data transfer between a processor and memory has become a design bottleneck in data-intensive applications. Processing-In-Memory (PIM) is a practical approach to overcome the memory wall bottleneck. The 4:2 compressor is suitable for implementing the processor's crucial arithmetic circuits, including multiplier. Some area-efficient memristive structures, like Material Implication (IMPLY) in serial architecture, are compatible with the crossbar array. This paper proposes a serial memristive IMPLY-based 4:2 compressor, which is applied to present new 4-bit and 8-bit multipliers. The proposed circuits are evaluated regarding latency, area, and energy consumption. Compared to the existing serial compressor, the proposed 4:2 compressor's algorithm improves the area, energy consumption, and speed by 36%, 17%, and 15%, respectively. The proposed 4-bit and 8-bit multipliers are improved by 7.3% and 10%, respectively, regarding the latency, and reduced energy consumption by up to 12%, compared to the serial multiplier based on a 4:2 compressor with XOR/MUX design.
