DOMAC: Differentiable Optimization for High-Speed Multipliers and Multiply-Accumulators
Chenhao Xue, Yi Ren, Jinwei Zhou, Kezhi Li, Chen Zhang, Yibo Lin, Lining Zhang, Qiang Xu, Guangyu Sun
TL;DR
DOMAC addresses the design-space explosion in high-speed multipliers by formulating compressor-tree design as differentiable optimization, enabling gradient-based search across interconnections and implementations. By mapping CT design to a DNN-training paradigm, it introduces differentiable timing and area objectives and a legalization step to recover discrete solutions. Key contributions include a differentiable area objective, a differentiable timing pipeline (pin load, cell delay, net delay propagation, and timing slack estimation), and regularization via softmax substitutions and LSE smoothing. Empirical results show DOMAC achieving up to $6.5\%$ delay reduction and $25\%$ area reduction relative to commercial IPs, validating its practicality for technology-node aware multiplier and MAC synthesis.
Abstract
Multipliers and multiply-accumulators (MACs) are fundamental building blocks for compute-intensive applications such as artificial intelligence. With the diminishing returns of Moore's Law, optimizing multiplier performance now necessitates process-aware architectural innovations rather than relying solely on technology scaling. In this paper, we introduce DOMAC, a novel approach that employs differentiable optimization for designing multipliers and MACs at specific technology nodes. DOMAC establishes an analogy between optimizing multi-staged parallel compressor trees and training deep neural networks. Building on this insight, DOMAC reformulates the discrete optimization challenge into a continuous problem by incorporating differentiable timing and area objectives. This formulation enables us to utilize existing deep learning toolkit for highly efficient implementation of the differentiable solver. Experimental results demonstrate that DOMAC achieves significant enhancements in both performance and area efficiency compared to state-of-the-art baselines and commercial IPs in multiplier and MAC designs.
