Table of Contents
Fetching ...

Accelerating Detailed Routing Convergence through Offline Reinforcement Learning

Afsara Khan, Austin Rovinski

TL;DR

Problem: detailed routing is slow due to complex design rules. Approach: offline Conservative Q-Learning learns per-iteration cost weights to minimize routing iterations. Key results: 5% average iteration reduction and up to 31% on ISPD19 unseen designs, with runtime speedups up to 3.01x; weights generalize across technologies. Significance: shows learned weight scheduling can accelerate detailed routing and be applied across designs, with intentions to release open-source code.

Abstract

Detailed routing remains one of the most complex and time-consuming steps in modern physical design due to the challenges posed by shrinking feature sizes and stricter design rules. Prior detailed routers achieve state-of-the-art results by leveraging iterative pathfinding algorithms to route each net. However, runtimes are a major issue in detailed routers, as converging to a solution with zero design rule violations (DRVs) can be prohibitively expensive. In this paper, we propose leveraging reinforcement learning (RL) to enable rapid convergence in detailed routing by learning from previous designs. We make the key observation that prior detailed routers statically schedule the cost weights used in their routing algorithms, meaning they do not change in response to the design or technology. By training a conservative Q-learning (CQL) model to dynamically select the routing cost weights which minimize the number of algorithm iterations, we find that our work completes the ISPD19 benchmarks with 1.56x average and up to 3.01x faster runtime than the baseline router while maintaining or improving the DRV count in all cases. We also find that this learning shows signs of generalization across technologies, meaning that learning designs in one technology can translate to improved outcomes in other technologies.

Accelerating Detailed Routing Convergence through Offline Reinforcement Learning

TL;DR

Problem: detailed routing is slow due to complex design rules. Approach: offline Conservative Q-Learning learns per-iteration cost weights to minimize routing iterations. Key results: 5% average iteration reduction and up to 31% on ISPD19 unseen designs, with runtime speedups up to 3.01x; weights generalize across technologies. Significance: shows learned weight scheduling can accelerate detailed routing and be applied across designs, with intentions to release open-source code.

Abstract

Detailed routing remains one of the most complex and time-consuming steps in modern physical design due to the challenges posed by shrinking feature sizes and stricter design rules. Prior detailed routers achieve state-of-the-art results by leveraging iterative pathfinding algorithms to route each net. However, runtimes are a major issue in detailed routers, as converging to a solution with zero design rule violations (DRVs) can be prohibitively expensive. In this paper, we propose leveraging reinforcement learning (RL) to enable rapid convergence in detailed routing by learning from previous designs. We make the key observation that prior detailed routers statically schedule the cost weights used in their routing algorithms, meaning they do not change in response to the design or technology. By training a conservative Q-learning (CQL) model to dynamically select the routing cost weights which minimize the number of algorithm iterations, we find that our work completes the ISPD19 benchmarks with 1.56x average and up to 3.01x faster runtime than the baseline router while maintaining or improving the DRV count in all cases. We also find that this learning shows signs of generalization across technologies, meaning that learning designs in one technology can translate to improved outcomes in other technologies.

Paper Structure

This paper contains 15 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Top: the baseline implementation uses a single sequence of weight vectors ($\textbf{w}$). Bottom: we use perturbation sampling to create many new sequences per design. Sequences start near the baseline values and then gradually diverge with more samples (increasing $\varepsilon$).
  • Figure 2: Optimal vs Baseline Weight Trend for Last Few Iterations Across all Designs in Dataset. Optimal weight trends differ substantially vs. baseline.
  • Figure 3: Training and inference flow for conservative Q-Learning model
  • Figure 4: CQL Training Progress
  • Figure 5: DRV convergence comparison for the OpenROAD Design Suite
  • ...and 2 more figures