Enhancing ASIC Technology Mapping via Parallel Supergate Computing
Ye Cai, Zonglin Yang, Liwei Ni, Biwei Xie, Xingquan Li
TL;DR
The paper tackles the time-intensive challenge of generating large numbers of supergates during ASIC technology mapping by introducing a parallel supergate computing framework. It leverages input-constrained patterns and a combination of recursive backtracking with multi-threaded processing to simultaneously generate and filter candidate supergates, thereby accelerating the mapping flow while preserving or improving delay QoR. Key contributions include a formalization of $k$-feasible cuts and Boolean matching in a parallel setting, a concrete pre-processing and post-processing pipeline, and complexity-aware design that demonstrates up to a 4x runtime speedup with 32 threads and about a 10.1% delay reduction in QoR on EPFL benchmarks. The approach shows meaningful practical impact for large-scale ASIC design by reducing mapping time without sacrificing, and in some cases improving, mapping quality when using an expanded supergate library.
Abstract
With the development of large-scale integrated circuits, electronic design automation~(EDA) tools are increasingly emphasizing efficiency, with parallel algorithms becoming a trend. The optimization of delay reduction is a crucial factor for ASIC technology mapping, and supergate technology proves to be an effective method for achieving this in EDA tools flow. However, we have observed that increasing the number of generated supergates can reduce delay, but this comes at the cost of an exponential increase in computation time. In this paper, we propose a parallel supergate computing method that addresses the tradeoff between time-consuming and delay optimization. The proposed method utilizes the input-constrained supergate pattern to parallelly generate the supergate candidates, and then filter the valid supergates as the results. Experiment results show the efficiency of the proposed method, for example, it can attain the improvement of 4x speedup in computation time and 10.1 in delay reduction with 32 threads.
