Table of Contents
Fetching ...

Power-Area Efficient Serial IMPLY-based 4:2 Compressor Applied in Data-Intensive Applications

Bahareh Bagheralmoosavi, Seyed Erfan Fatemieh, Mohammad Reza Reshadinezhad, Antonio Rubio

TL;DR

The paper tackles the memory bandwidth bottleneck in data-intensive computing by proposing a serial memristive IMPLY-based 4:2 compressor tailored for crossbar-compatible in-memory computing. It introduces a NAND-based IMPLY implementation that uses 7 memristors and 44 computational steps, achieving significant area and energy savings. By integrating this compressor into 4×4 and 8×8 multipliers, the authors merge partial-product generation with carry-propagation steps, enabling substantial reductions in memristor count, latency, and energy compared with XOR/MUX-based designs. Validation via SPICE with a VTEAM memristor model demonstrates concrete improvements (e.g., up to $36\\%$ fewer memristors and $12\%$ energy reductions for larger multipliers), underscoring practical impact for PIM-based arithmetic in data-intensive applications. The work advances crossbar-friendly IMPLY-based arithmetic, providing a scalable path for energy-efficient, memory-centric computing systems.

Abstract

The data transfer between a processor and memory has become a design bottleneck in data-intensive applications. Processing-In-Memory (PIM) is a practical approach to overcome the memory wall bottleneck. The 4:2 compressor is suitable for implementing the processor's crucial arithmetic circuits, including multiplier. Some area-efficient memristive structures, like Material Implication (IMPLY) in serial architecture, are compatible with the crossbar array. This paper proposes a serial memristive IMPLY-based 4:2 compressor, which is applied to present new 4-bit and 8-bit multipliers. The proposed circuits are evaluated regarding latency, area, and energy consumption. Compared to the existing serial compressor, the proposed 4:2 compressor's algorithm improves the area, energy consumption, and speed by 36%, 17%, and 15%, respectively. The proposed 4-bit and 8-bit multipliers are improved by 7.3% and 10%, respectively, regarding the latency, and reduced energy consumption by up to 12%, compared to the serial multiplier based on a 4:2 compressor with XOR/MUX design.

Power-Area Efficient Serial IMPLY-based 4:2 Compressor Applied in Data-Intensive Applications

TL;DR

The paper tackles the memory bandwidth bottleneck in data-intensive computing by proposing a serial memristive IMPLY-based 4:2 compressor tailored for crossbar-compatible in-memory computing. It introduces a NAND-based IMPLY implementation that uses 7 memristors and 44 computational steps, achieving significant area and energy savings. By integrating this compressor into 4×4 and 8×8 multipliers, the authors merge partial-product generation with carry-propagation steps, enabling substantial reductions in memristor count, latency, and energy compared with XOR/MUX-based designs. Validation via SPICE with a VTEAM memristor model demonstrates concrete improvements (e.g., up to fewer memristors and energy reductions for larger multipliers), underscoring practical impact for PIM-based arithmetic in data-intensive applications. The work advances crossbar-friendly IMPLY-based arithmetic, providing a scalable path for energy-efficient, memory-centric computing systems.

Abstract

The data transfer between a processor and memory has become a design bottleneck in data-intensive applications. Processing-In-Memory (PIM) is a practical approach to overcome the memory wall bottleneck. The 4:2 compressor is suitable for implementing the processor's crucial arithmetic circuits, including multiplier. Some area-efficient memristive structures, like Material Implication (IMPLY) in serial architecture, are compatible with the crossbar array. This paper proposes a serial memristive IMPLY-based 4:2 compressor, which is applied to present new 4-bit and 8-bit multipliers. The proposed circuits are evaluated regarding latency, area, and energy consumption. Compared to the existing serial compressor, the proposed 4:2 compressor's algorithm improves the area, energy consumption, and speed by 36%, 17%, and 15%, respectively. The proposed 4-bit and 8-bit multipliers are improved by 7.3% and 10%, respectively, regarding the latency, and reduced energy consumption by up to 12%, compared to the serial multiplier based on a 4:2 compressor with XOR/MUX design.
Paper Structure (16 sections, 18 equations, 11 figures, 9 tables)

This paper contains 16 sections, 18 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Memristor-based IMPLY gate ref28.
  • Figure 2: The IMPLY-based adder architectures: (a) serial, (b) parallel,(c) semi-parallel, and (d) semi-serial ref28.
  • Figure 3: (a) 4:2 compressor module, and (b) conventional full adder based 4:2 compressor ref20.
  • Figure 4: (a) XOR/MUX design of 4:2 compressor ref23, and (b) 2:1 MUX based 4:2 compressor ref33.
  • Figure 5: (a) Design of a NAND-based XOR gate, (b) design of an XOR-based full adder based on NAND gates, and (c) NAND-based 4:2 compressor
  • ...and 6 more figures