Table of Contents
Fetching ...

That Chip Has Sailed: A Critique of Unfounded Skepticism Around AI for Chip Design

Anna Goldie, Azalia Mirhoseini, Jeff Dean

TL;DR

Although AlphaChip has already achieved widespread adoption and impact, a non-peer-reviewed invited paper at ISPD 2023 questioned its performance claims, despite failing to run the deep reinforcement learning method as described in Nature.

Abstract

In 2020, we introduced a deep reinforcement learning method capable of generating superhuman chip layouts, which we then published in Nature and open-sourced on GitHub. AlphaChip has inspired an explosion of work on AI for chip design, and has been deployed in state-of-the-art chips across Alphabet and extended by external chipmakers. Even so, a non-peer-reviewed invited paper at ISPD 2023 questioned its performance claims, despite failing to run our method as described in Nature. For example, it did not pre-train the RL method (removing its ability to learn from prior experience), used substantially fewer compute resources (20x fewer RL experience collectors and half as many GPUs), did not train to convergence (standard practice in machine learning), and evaluated on test cases that are not representative of modern chips. Recently, Igor Markov published a meta-analysis of three papers: our peer-reviewed Nature paper, the non-peer-reviewed ISPD paper, and Markov's own unpublished paper (though he does not disclose that he co-authored it). Although AlphaChip has already achieved widespread adoption and impact, we publish this response to ensure that no one is wrongly discouraged from innovating in this impactful area.

That Chip Has Sailed: A Critique of Unfounded Skepticism Around AI for Chip Design

TL;DR

Although AlphaChip has already achieved widespread adoption and impact, a non-peer-reviewed invited paper at ISPD 2023 questioned its performance claims, despite failing to run the deep reinforcement learning method as described in Nature.

Abstract

In 2020, we introduced a deep reinforcement learning method capable of generating superhuman chip layouts, which we then published in Nature and open-sourced on GitHub. AlphaChip has inspired an explosion of work on AI for chip design, and has been deployed in state-of-the-art chips across Alphabet and extended by external chipmakers. Even so, a non-peer-reviewed invited paper at ISPD 2023 questioned its performance claims, despite failing to run our method as described in Nature. For example, it did not pre-train the RL method (removing its ability to learn from prior experience), used substantially fewer compute resources (20x fewer RL experience collectors and half as many GPUs), did not train to convergence (standard practice in machine learning), and evaluated on test cases that are not representative of modern chips. Recently, Igor Markov published a meta-analysis of three papers: our peer-reviewed Nature paper, the non-peer-reviewed ISPD paper, and Markov's own unpublished paper (though he does not disclose that he co-authored it). Although AlphaChip has already achieved widespread adoption and impact, we publish this response to ensure that no one is wrongly discouraged from innovating in this impactful area.

Paper Structure

This paper contains 15 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: AlphaChip has been deployed in three additional generations of TPU. In each generation, it has been adopted in a greater proportion of blocks and has outperformed human experts by a wider margin.
  • Figure 2: Figure 5 of the Nature paper (reproduced above) shows performance gains from pre-training on a larger number of blocks. As we scale up the pre-training dataset size, the RL agent's performance improves.
  • Figure 3: Figure 4 of the Nature paper (reproduced above) showed that pre-training improves convergence speed compared to starting from a randomly initialized policy. On the open-source Ariane RISC-V CPU, the randomly initialized policy took 48 hours to approach what the pre-trained policy could produce in 6 hours.
  • Figure 4: Figure 6 from a follow-up paper yue2022circuittraining (reproduced above) demonstrated that speed and quality improve with additional compute resources. Left: Placement return (higher is better) vs. training time as a function of the number of GPUs. An infeasible placement receives a [1.0]$-$ 2 placement return. Increasing the number of GPUs results in better final placements. Right: Time to reach a given placement return as a function of the number of GPUs. The grey bars indicate that the experiment did not reach a specific return value. The best placement return [1.0]$-$ 1.07 can only be achieved with GPU=8, the largest setting in this experiment.
  • Figure 5: Convergence plots from Cheng et al.'s project site. On Ariane-NG45 (top left) and MemPool-NG45 (top right), there is an odd divergence at around 100k steps, but loss appears to be trending downwards and would likely have improved with further training. On BlackParrot-GF12 (bottom left) and MemPool-GF12 (bottom right), the model has not yet converged and would likely benefit from additional training time as well.
  • ...and 1 more figures