Table of Contents
Fetching ...

FloorSet -- a VLSI Floorplanning Dataset with Design Constraints of Real-World SoCs

Uday Mallappa, Hesham Mostafa, Mikhail Galkin, Mariano Phielipp, Somdeb Majumdar

TL;DR

FloorSet addresses the lack of large-scale, realistic ML benchmarks for VLSI floorplanning by introducing two synthetic datasets, FloorSet-Prime and FloorSet-Lite, each with 1M training samples and 100 test samples per variant. The data-generation pipeline derives target distributions from real layouts and yields either fully-abutted rectilinear partitions (FloorSet-Prime) or rectangular partitions with limited whitespace (FloorSet-Lite), both enforcing hard constraints such as shape, edge affinity, and pre-placement. Analyses show that FloorSet distributions closely resemble real designs in partition shapes and connectivity, validated via PDF comparisons, and baseline SA experiments illustrate the challenge of achieving wire-length minimization under constraints. The open-source FloorSet datasets thus provide reproducible, scalable benchmarks for ML-based floorplanning across early and final design stages, enabling fair cross-method comparisons in constrained optimization for EDA.

Abstract

Floorplanning for systems-on-a-chip (SoCs) and its sub-systems is a crucial and non-trivial step of the physical design flow. It represents a difficult combinatorial optimization problem. A typical large scale SoC with 120 partitions generates a search-space of nearly 10E250. As novel machine learning (ML) approaches emerge to tackle such problems, there is a growing need for a modern benchmark that comprises a large training dataset and performance metrics that better reflect real-world constraints and objectives compared to existing benchmarks. To address this need, we present FloorSet -- two comprehensive datasets of synthetic fixed-outline floorplan layouts that reflect the distribution of real SoCs. Each dataset has 1M training samples and 100 test samples where each sample is a synthetic floor-plan. FloorSet-Prime comprises fully-abutted rectilinear partitions and near-optimal wire-length. A simplified dataset that reflects early design phases, FloorSet-Lite comprises rectangular partitions, with under 5 percent white-space and near-optimal wire-length. Both datasets define hard constraints seen in modern design flows such as shape constraints, edge-affinity, grouping constraints, and pre-placement constraints. FloorSet is intended to spur fundamental research on large-scale constrained optimization problems. Crucially, FloorSet alleviates the core issue of reproducibility in modern ML driven solutions to such problems. FloorSet is available as an open-source repository for the research community.

FloorSet -- a VLSI Floorplanning Dataset with Design Constraints of Real-World SoCs

TL;DR

FloorSet addresses the lack of large-scale, realistic ML benchmarks for VLSI floorplanning by introducing two synthetic datasets, FloorSet-Prime and FloorSet-Lite, each with 1M training samples and 100 test samples per variant. The data-generation pipeline derives target distributions from real layouts and yields either fully-abutted rectilinear partitions (FloorSet-Prime) or rectangular partitions with limited whitespace (FloorSet-Lite), both enforcing hard constraints such as shape, edge affinity, and pre-placement. Analyses show that FloorSet distributions closely resemble real designs in partition shapes and connectivity, validated via PDF comparisons, and baseline SA experiments illustrate the challenge of achieving wire-length minimization under constraints. The open-source FloorSet datasets thus provide reproducible, scalable benchmarks for ML-based floorplanning across early and final design stages, enabling fair cross-method comparisons in constrained optimization for EDA.

Abstract

Floorplanning for systems-on-a-chip (SoCs) and its sub-systems is a crucial and non-trivial step of the physical design flow. It represents a difficult combinatorial optimization problem. A typical large scale SoC with 120 partitions generates a search-space of nearly 10E250. As novel machine learning (ML) approaches emerge to tackle such problems, there is a growing need for a modern benchmark that comprises a large training dataset and performance metrics that better reflect real-world constraints and objectives compared to existing benchmarks. To address this need, we present FloorSet -- two comprehensive datasets of synthetic fixed-outline floorplan layouts that reflect the distribution of real SoCs. Each dataset has 1M training samples and 100 test samples where each sample is a synthetic floor-plan. FloorSet-Prime comprises fully-abutted rectilinear partitions and near-optimal wire-length. A simplified dataset that reflects early design phases, FloorSet-Lite comprises rectangular partitions, with under 5 percent white-space and near-optimal wire-length. Both datasets define hard constraints seen in modern design flows such as shape constraints, edge-affinity, grouping constraints, and pre-placement constraints. FloorSet is intended to spur fundamental research on large-scale constrained optimization problems. Crucially, FloorSet alleviates the core issue of reproducibility in modern ML driven solutions to such problems. FloorSet is available as an open-source repository for the research community.
Paper Structure (12 sections, 11 figures, 1 table, 6 algorithms)

This paper contains 12 sections, 11 figures, 1 table, 6 algorithms.

Figures (11)

  • Figure 1: Our work focuses on establishing realistic benchmarks $\mathtt{FloorSet }$, for the first two steps (shaded) of the design planning phase of the back-end flow.
  • Figure 2: The bookshelf *.blocks file is modified to include fixed-outline dimensions, area budgets, shape constraints (aspect ratio range) and placement constraints.
  • Figure 3: The bookshelf *.nets file is modified to add net weights.
  • Figure 4: The distribution of parameters (Table \ref{['tab:real_params']}) and the custom configuration file serve as inputs for the data generation pipeline. The output layouts are formatted in the standard bookshelf format and Pytorch tensor format.
  • Figure 5: Overview of the five-step $\mathtt{FloorSet }$ data generation framework, illustrating the sequential processes involved in the methodology: 1. Collection and extraction of target layout distributions, 2. Partitioning shapes with the target area budgets, 3. Annotation of terminal locations, 4. Annotation of connectivity matrix (weighted), and 5. Annotation of placement.
  • ...and 6 more figures