Combining Power and Arithmetic Optimization via Datapath Rewriting
Samuel Coward, Theo Drane, Emiliano Morini, George Constantinides
TL;DR
The paper addresses dynamic power in RTL arithmetic datapaths and its tradeoff with area by proposing a workload-aware RTL optimization flow. It introduces ROVER, an e-graph based framework that encodes power optimizations (clock gating and operand isolation) as local rewrites and combines them with arithmetic/area rewrites to explore a broad design space. Through workload-driven simulation and a compact power model, ROVER extracts the most power-efficient implementation via ILP, outputting synthesizable RTL. Experiments on open-source and production-inspired benchmarks show up to 33.9% power reduction with about 5% area overhead, validating the effectiveness of data-dependent RTL customization.
Abstract
Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.
