Table of Contents
Fetching ...

Low-Energy Line Codes for On-Chip Networks

Beyza Dabak, Major Glenn, Jingyang Liu, Alexander Buck, Siyi Yang, Robert Calderbank, Natalie Enright Jerger, Daniel J. Sorin

TL;DR

This work tackles the high energy cost of on-chip communication by introducing Low-Energy Line Codes (LELCs) that exploit nonuniform dataword statistics in conjunction with NRZI signaling to reduce voltage transitions in the OCN. It systematically develops several practical coding families—Flip-N-Write, Tree Codes, Mapping Codes, and Compound codes—and analyzes their energy-rate trade-offs, hardware complexity, and crosstalk impact. The authors validate their approach with full-system simulations on CPU and GPU workloads, demonstrating energy reductions up to about 36% with modest runtime penalties and significant crosstalk reductions, while also proposing dynamic throttling to preserve performance under high utilization. The results offer a practical path to energy-aware on-chip networks, with broad implications for CPUs, GPUs, chiplets, and ML accelerators where nonuniform data patterns are prevalent and link energy is a dominant factor.

Abstract

Energy is a primary constraint in processor design, and much of that energy is consumed in on-chip communication. Communication can be intra-core (e.g., from a register file to an ALU) or inter-core (e.g., over the on-chip network). In this paper, we use the on-chip network (OCN) as a case study for saving on-chip communication energy. We have identified a new way to reduce the OCN's link energy consumption by using line coding, a longstanding technique in information theory. Our line codes, called Low-Energy Line Codes (LELCs), reduce energy by reducing the frequency of voltage transitions of the links, and they achieve a range of energy/performance trade-offs.

Low-Energy Line Codes for On-Chip Networks

TL;DR

This work tackles the high energy cost of on-chip communication by introducing Low-Energy Line Codes (LELCs) that exploit nonuniform dataword statistics in conjunction with NRZI signaling to reduce voltage transitions in the OCN. It systematically develops several practical coding families—Flip-N-Write, Tree Codes, Mapping Codes, and Compound codes—and analyzes their energy-rate trade-offs, hardware complexity, and crosstalk impact. The authors validate their approach with full-system simulations on CPU and GPU workloads, demonstrating energy reductions up to about 36% with modest runtime penalties and significant crosstalk reductions, while also proposing dynamic throttling to preserve performance under high utilization. The results offer a practical path to energy-aware on-chip networks, with broad implications for CPUs, GPUs, chiplets, and ML accelerators where nonuniform data patterns are prevalent and link energy is a dominant factor.

Abstract

Energy is a primary constraint in processor design, and much of that energy is consumed in on-chip communication. Communication can be intra-core (e.g., from a register file to an ALU) or inter-core (e.g., over the on-chip network). In this paper, we use the on-chip network (OCN) as a case study for saving on-chip communication energy. We have identified a new way to reduce the OCN's link energy consumption by using line coding, a longstanding technique in information theory. Our line codes, called Low-Energy Line Codes (LELCs), reduce energy by reducing the frequency of voltage transitions of the links, and they achieve a range of energy/performance trade-offs.
Paper Structure (26 sections, 1 equation, 12 figures, 5 tables)

This paper contains 26 sections, 1 equation, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Dataword distributions for benchmarks (2 CPU and 1 GPU)
  • Figure 2: Optimal trade-off between rate and energy reduction for equiprobable input data.
  • Figure 3: Illustration of 2-level Flip-N-Write (k=4).
  • Figure 4: Huffman coding tree with codewords at leaves. Assumes the frequency of input datawords is, in decreasing order: 00, 11, 01, and 10. These datawords are mapped to codewords 0, 11, 100, and 101.
  • Figure 5: Tree Code 1 (TC1): Variable rate that is bounded between $3/4$ and $5/4$. 1 bit added redundancy for all 0s and compression opportunity for strings of 1s.
  • ...and 7 more figures