Table of Contents
Fetching ...

Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture

Ataberk Olgun, F. Nisa Bostanci, Geraldo F. Oliveira, Yahya Can Tugrul, Rahul Bera, A. Giray Yaglikci, Hasan Hassan, Oguz Ergin, Onur Mutlu

TL;DR

Sectored DRAM tackles energy waste in DRAM by enabling fine-grained data transfers and row activation at low hardware cost. It introduces Variable Burst Length ($VBL$) and Sectored Activation ($SA$), together with system-level mechanisms LSQ Lookahead and Sector Predictor to integrate fine-grained DRAM into real hardware interfaces. Through cycle-accurate evaluation on diverse workloads, Sectored DRAM achieves substantial DRAM energy reductions (up to 33% on average) and system energy savings (up to 23%), while delivering notable performance improvements for memory-intensive, irregular-access workloads; area overhead remains below ~1.8% per DRAM chip. The design is practical, scalable with multi-channel configurations, and open-sourced to enable further research and adoption.

Abstract

We propose Sectored DRAM, a new, low-overhead DRAM substrate that reduces wasted energy by enabling fine-grained DRAM data transfers and DRAM row activation. Sectored DRAM leverages two key ideas to enable fine-grained data transfers and row activation at low chip area cost. First, a cache block transfer between main memory and the memory controller happens in a fixed number of clock cycles where only a small portion of the cache block (a word) is transferred in each cycle. Sectored DRAM augments the memory controller and the DRAM chip to execute cache block transfers in a variable number of clock cycles based on the workload access pattern with minor modifications to the memory controller's and the DRAM chip's circuitry. Second, a large DRAM row, by design, is already partitioned into smaller independent physically isolated regions. Sectored DRAM provides the memory controller with the ability to activate each such region based on the workload access pattern via small modifications to the DRAM chip's array access circuitry. Activating smaller regions of a large row relaxes DRAM power delivery constraints and allows the memory controller to schedule DRAM accesses faster. Compared to a system with coarse-grained DRAM, Sectored DRAM reduces the DRAM energy consumption of highly-memory-intensive workloads by up to (on average) 33% (20%) while improving their performance by up to (on average) 36% (17%). Sectored DRAM's DRAM energy savings, combined with its system performance improvement, allows system-wide energy savings of up to 23%. Sectored DRAM's DRAM chip area overhead is 1.7% the area of a modern DDR4 chip. We hope and believe that Sectored DRAM's ideas and results will help to enable more efficient and high-performance memory systems. To this end, we open source Sectored DRAM at https://github.com/CMU-SAFARI/Sectored-DRAM.

Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture

TL;DR

Sectored DRAM tackles energy waste in DRAM by enabling fine-grained data transfers and row activation at low hardware cost. It introduces Variable Burst Length () and Sectored Activation (), together with system-level mechanisms LSQ Lookahead and Sector Predictor to integrate fine-grained DRAM into real hardware interfaces. Through cycle-accurate evaluation on diverse workloads, Sectored DRAM achieves substantial DRAM energy reductions (up to 33% on average) and system energy savings (up to 23%), while delivering notable performance improvements for memory-intensive, irregular-access workloads; area overhead remains below ~1.8% per DRAM chip. The design is practical, scalable with multi-channel configurations, and open-sourced to enable further research and adoption.

Abstract

We propose Sectored DRAM, a new, low-overhead DRAM substrate that reduces wasted energy by enabling fine-grained DRAM data transfers and DRAM row activation. Sectored DRAM leverages two key ideas to enable fine-grained data transfers and row activation at low chip area cost. First, a cache block transfer between main memory and the memory controller happens in a fixed number of clock cycles where only a small portion of the cache block (a word) is transferred in each cycle. Sectored DRAM augments the memory controller and the DRAM chip to execute cache block transfers in a variable number of clock cycles based on the workload access pattern with minor modifications to the memory controller's and the DRAM chip's circuitry. Second, a large DRAM row, by design, is already partitioned into smaller independent physically isolated regions. Sectored DRAM provides the memory controller with the ability to activate each such region based on the workload access pattern via small modifications to the DRAM chip's array access circuitry. Activating smaller regions of a large row relaxes DRAM power delivery constraints and allows the memory controller to schedule DRAM accesses faster. Compared to a system with coarse-grained DRAM, Sectored DRAM reduces the DRAM energy consumption of highly-memory-intensive workloads by up to (on average) 33% (20%) while improving their performance by up to (on average) 36% (17%). Sectored DRAM's DRAM energy savings, combined with its system performance improvement, allows system-wide energy savings of up to 23%. Sectored DRAM's DRAM chip area overhead is 1.7% the area of a modern DDR4 chip. We hope and believe that Sectored DRAM's ideas and results will help to enable more efficient and high-performance memory systems. To this end, we open source Sectored DRAM at https://github.com/CMU-SAFARI/Sectored-DRAM.
Paper Structure (33 sections, 15 figures, 2 tables)

This paper contains 33 sections, 15 figures, 2 tables.

Figures (15)

  • Figure 1: DRAM module, chip, and bank organization, as depicted in oliveira2024mimdram
  • Figure 2: Example cache block placement in DRAM mats (left) and diagram depicting an 8-cycle data transfer burst to Chip 0 (right). "B" means "byte".
  • Figure 3: Normalized DRAM access (left) and DRAM activation (right) energy consumption.
  • Figure 4: Wordline organization in a conventional DRAM subarray (left), in a Sectored DRAM subarray (right)
  • Figure 5: I/O circuitry of a DRAM chip and Variable Burst Length
  • ...and 10 more figures