Table of Contents
Fetching ...

Combining Power and Arithmetic Optimization via Datapath Rewriting

Samuel Coward, Theo Drane, Emiliano Morini, George Constantinides

TL;DR

The paper addresses dynamic power in RTL arithmetic datapaths and its tradeoff with area by proposing a workload-aware RTL optimization flow. It introduces ROVER, an e-graph based framework that encodes power optimizations (clock gating and operand isolation) as local rewrites and combines them with arithmetic/area rewrites to explore a broad design space. Through workload-driven simulation and a compact power model, ROVER extracts the most power-efficient implementation via ILP, outputting synthesizable RTL. Experiments on open-source and production-inspired benchmarks show up to 33.9% power reduction with about 5% area overhead, validating the effectiveness of data-dependent RTL customization.

Abstract

Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.

Combining Power and Arithmetic Optimization via Datapath Rewriting

TL;DR

The paper addresses dynamic power in RTL arithmetic datapaths and its tradeoff with area by proposing a workload-aware RTL optimization flow. It introduces ROVER, an e-graph based framework that encodes power optimizations (clock gating and operand isolation) as local rewrites and combines them with arithmetic/area rewrites to explore a broad design space. Through workload-driven simulation and a compact power model, ROVER extracts the most power-efficient implementation via ILP, outputting synthesizable RTL. Experiments on open-source and production-inspired benchmarks show up to 33.9% power reduction with about 5% area overhead, validating the effectiveness of data-dependent RTL customization.

Abstract

Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.
Paper Structure (13 sections, 11 equations, 6 figures, 3 tables)

This paper contains 13 sections, 11 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: An operand isolation opportunity. In the original circuit (black), the input to the multiplier can be data gated when the select signal is one, as shown by the red gate. The negated select signal, $\overline{S}$ is a common input to an array of AND gates equal to the bitwidth of $C$.
  • Figure 2: Circuit diagrams of the TREG and REG operators.
  • Figure 3: E-graph rewriting of a masking operation. Dashed boxes represent e-classes of equivalent expressions. A new equivalent expression is added to the e-graph represented by the second $\&$ operator in the root e-class.
  • Figure 4: ROVER's power optimization tool flow. Users provide an input Verilog design and input stimuli via simulation data or switching activity statistics.
  • Figure 5: The number of designs vs. the number of e-classes after each iteration of rewriting the design in Figure \ref{['fig:op_isolate']}. Simulation complexity scales with the number of e-classes but evaluates all designs in the e-graph.
  • ...and 1 more figures