Table of Contents
Fetching ...

FPGA Divide-and-Conquer Placement using Deep Reinforcement Learning

Shang Wang, Deepak Ranganatha Sastry Mamillapalli, Tianpei Yang, Matthew E. Taylor

TL;DR

This work tackles FPGA placement by framing it as a Markov decision process and applying deep reinforcement learning to minimize wirelength. It introduces a two-branch architecture with board and netlist encoders and uses Proximal Policy Optimization (PPO) with invalid-action masking to learn a placement policy, while tackling the large search space via a divide-and-conquer decomposition into subtasks. The key contributions are a novel state representation that fuses board observations with netlist context and a systematic decomposition paradigm with multiple weight-sharing configurations, plus empirical evidence showing improved learning efficiency over a non-decomposed baseline and informative findings on weight reuse and exploration. Although not yet surpassing the VTR baseline in all cases, the approach demonstrates feasibility and lays groundwork for multi-objective optimization and broader netlist testing in FPGA placement, with potential impact on faster, more scalable EDA workflows.

Abstract

This paper introduces the problem of learning to place logic blocks in Field-Programmable Gate Arrays (FPGAs) and a learning-based method. In contrast to previous search-based placement algorithms, we instead employ Reinforcement Learning (RL) with the goal of minimizing wirelength. In addition to our preliminary learning results, we also evaluated a novel decomposition to address the nature of large search space when placing many blocks on a chipboard. Empirical experiments evaluate the effectiveness of the learning and decomposition paradigms on FPGA placement tasks.

FPGA Divide-and-Conquer Placement using Deep Reinforcement Learning

TL;DR

This work tackles FPGA placement by framing it as a Markov decision process and applying deep reinforcement learning to minimize wirelength. It introduces a two-branch architecture with board and netlist encoders and uses Proximal Policy Optimization (PPO) with invalid-action masking to learn a placement policy, while tackling the large search space via a divide-and-conquer decomposition into subtasks. The key contributions are a novel state representation that fuses board observations with netlist context and a systematic decomposition paradigm with multiple weight-sharing configurations, plus empirical evidence showing improved learning efficiency over a non-decomposed baseline and informative findings on weight reuse and exploration. Although not yet surpassing the VTR baseline in all cases, the approach demonstrates feasibility and lays groundwork for multi-objective optimization and broader netlist testing in FPGA placement, with potential impact on faster, more scalable EDA workflows.

Abstract

This paper introduces the problem of learning to place logic blocks in Field-Programmable Gate Arrays (FPGAs) and a learning-based method. In contrast to previous search-based placement algorithms, we instead employ Reinforcement Learning (RL) with the goal of minimizing wirelength. In addition to our preliminary learning results, we also evaluated a novel decomposition to address the nature of large search space when placing many blocks on a chipboard. Empirical experiments evaluate the effectiveness of the learning and decomposition paradigms on FPGA placement tasks.
Paper Structure (15 sections, 5 figures, 2 tables)

This paper contains 15 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The FPGA board is $11 \times 11$ units in size and incorporates DSP (Digital Signal Processor), CLB (Configurable Logic Block), I/O, and RAM (Random Access Memory) blocks, as well as I/O locations of capacity 2.
  • Figure 2: Overview of our model structure, which contains two main parts: representation layers and decision layers, including a policy network, and a value network. The representation layers take board observations, the netlist graph, and the current block index as input, while the decision layers output a probability distribution over available placement locations (the policy $\pi(a_t|s_t)$) and an estimate of the expected reward for the current placement (the state value $\hat{V}_t$).
  • Figure 3: The 30-block decomposition training paradigm.
  • Figure 4: The wirelength chart illustrates the policy performance curves during the training process. (a) the wirelength curves in 30-blocks decomposition; (b) the wirelength curves in 56-blocks decomposition with 4 granularity; (c) the wirelength curves in 30-blocks decomposition with 2 granularity.
  • Figure 5: The entropy chart illustrates the policy entropy curves during the training process, highlighting how the policy exploration evolves with 30 blocks (a), 56 blocks with graularity 4 (b), and 56 blocks with granularity 2 (c).