Table of Contents
Fetching ...

Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs

Yao Lai, Jinxin Liu, David Z. Pan, Ping Luo

TL;DR

This paper reframes the design of arithmetic modules as tree-generation problems (AddGame for adders and MultGame for multipliers) optimized by reinforcement learning. It introduces two specialized agents—MCTS for prefix-tree optimization and PPO for compressor-tree optimization—coupled in a co-design loop with a fast yet accurate synthesis flow. The approach yields Pareto-optimal 128-bit adders (e.g., $L=10$, size $244$) and substantial improvements over state-of-the-art RL methods in delay and area, with demonstrated transfer to 7nm technology. Practically, these results offer scalable, hardware-accurate designs that can be integrated into modern synthesis flows, potentially accelerating performance and reducing silicon area across a range of devices.

Abstract

Across a wide range of hardware scenarios, the computational efficiency and physical size of the arithmetic units significantly influence the speed and footprint of the overall hardware system. Nevertheless, the effectiveness of prior arithmetic design techniques proves inadequate, as it does not sufficiently optimize speed and area, resulting in a reduced processing rate and larger module size. To boost the arithmetic performance, in this work, we focus on the two most common and fundamental arithmetic modules: adders and multipliers. We cast the design tasks as single-player tree generation games, leveraging reinforcement learning techniques to optimize their arithmetic tree structures. Such a tree generation formulation allows us to efficiently navigate the vast search space and discover superior arithmetic designs that improve computational efficiency and hardware size within just a few hours. For adders, our approach discovers designs of 128-bit adders that achieve Pareto optimality in theoretical metrics. Compared with the state-of-the-art PrefixRL, our method decreases computational delay and hardware size by up to 26% and 30%, respectively. For multipliers, when compared to RL-MUL, our approach increases speed and reduces size by as much as 49% and 45%. Moreover, the inherent flexibility and scalability of our method enable us to deploy our designs into cutting-edge technologies, as we show that they can be seamlessly integrated into 7nm technology. We believe our work will offer valuable insights into hardware design, further accelerating speed and reducing size through the refined search space and our tree generation methodologies. See our introduction video at https://bit.ly/ArithmeticTree. Codes are released at https://github.com/laiyao1/ArithmeticTree.

Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs

TL;DR

This paper reframes the design of arithmetic modules as tree-generation problems (AddGame for adders and MultGame for multipliers) optimized by reinforcement learning. It introduces two specialized agents—MCTS for prefix-tree optimization and PPO for compressor-tree optimization—coupled in a co-design loop with a fast yet accurate synthesis flow. The approach yields Pareto-optimal 128-bit adders (e.g., , size ) and substantial improvements over state-of-the-art RL methods in delay and area, with demonstrated transfer to 7nm technology. Practically, these results offer scalable, hardware-accurate designs that can be integrated into modern synthesis flows, potentially accelerating performance and reducing silicon area across a range of devices.

Abstract

Across a wide range of hardware scenarios, the computational efficiency and physical size of the arithmetic units significantly influence the speed and footprint of the overall hardware system. Nevertheless, the effectiveness of prior arithmetic design techniques proves inadequate, as it does not sufficiently optimize speed and area, resulting in a reduced processing rate and larger module size. To boost the arithmetic performance, in this work, we focus on the two most common and fundamental arithmetic modules: adders and multipliers. We cast the design tasks as single-player tree generation games, leveraging reinforcement learning techniques to optimize their arithmetic tree structures. Such a tree generation formulation allows us to efficiently navigate the vast search space and discover superior arithmetic designs that improve computational efficiency and hardware size within just a few hours. For adders, our approach discovers designs of 128-bit adders that achieve Pareto optimality in theoretical metrics. Compared with the state-of-the-art PrefixRL, our method decreases computational delay and hardware size by up to 26% and 30%, respectively. For multipliers, when compared to RL-MUL, our approach increases speed and reduces size by as much as 49% and 45%. Moreover, the inherent flexibility and scalability of our method enable us to deploy our designs into cutting-edge technologies, as we show that they can be seamlessly integrated into 7nm technology. We believe our work will offer valuable insights into hardware design, further accelerating speed and reducing size through the refined search space and our tree generation methodologies. See our introduction video at https://bit.ly/ArithmeticTree. Codes are released at https://github.com/laiyao1/ArithmeticTree.
Paper Structure (29 sections, 7 equations, 17 figures, 7 tables)

This paper contains 29 sections, 7 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Our approach framework. Two agents respectively optimize prefix and compressor trees, modeling the tasks as AddGame for adders and MultGame for multipliers.
  • Figure 2: Comparison of design processes. (a) Default design process. The synthesis tool automatically generates a default multiplier when directly using multiplication commands (x*y) in Verilog HDL code. (b) Our enhanced design process. Our approach discovers an optimized multiplier structure and generates specialized Verilog HDL code for this improved structure, reducing delay and area after synthesis.
  • Figure 3: Arithmetic trees. (a) Example of a prefix tree. (b) Example of a compressor tree. Different tree structures lead to different qualities of adder and multiplier designs.
  • Figure 4: Method for designing prefix trees with MCTS. Four phases in the search process are executed iteratively, gradually building a search tree.
  • Figure 5: Method for designing compressor trees with PPO. Three representations are illustrated. (a) Dot notation. Each dot represents an output bit, with the number inside indicating the estimated delay for selecting adder input bits. The agent's actions involve adding full or half adders to compress the bits until each binary digit contains no more than two bits. The final reward, $r_T$, is defined as the inverse of the delay, encouraging designs with lower delays. (b) Binary bit notation. 0/1 are values of bits for the example multiplication. (c) Logic gate notation. The actual logic gate circuit design for each state.
  • ...and 12 more figures