Algorithm-Driven On-Chip Integration for High Density and Low Cost
Jeongeun Kim, Sabrina Yarzada, Paul Chen, Christopher Torng
TL;DR
This work addresses the need for scalable, low-cost silicon prototyping by proposing Chipstitch, an integrated framework for active chip site aggregation. It combines algorithmic packing on a grid-based template space, a fixed-area interconnect architecture that fits between packing results, and perimeter power-shutdown domains to enable bench-level power characterization without low-power design expertise. The approach achieves substantial area reductions (up to 13x) compared with state-of-the-art physical-only methods and demonstrates practicality with a 25-site Intel 16 nm prototype, highlighting potential for nation-scale educational and research tapeouts. Collectively, the paper lays the groundwork for scalable, education-friendly chip-tapeout environments by unifying algorithmic, architectural, and VLSI design considerations.
Abstract
Growing interest in semiconductor workforce development has generated demand for platforms capable of supporting large numbers of independent hardware designs for research and training without imposing high per-project overhead. Traditional multi-project wafer (MPW) services based solely on physical co-placement have historically met this need, yet their scalability breaks down as project counts rise. Recent efforts towards scalable chip tapeouts mitigate these limitations by integrating many small designs within a shared die and attempt to amortize costly resources such as IO pads and memory macros. However, foundational principles for arranging, linking, and validating such densely integrated design sites have received limited systematic investigation. This work presents a new approach with three key techniques to address this gap. First, we establish a structured formulation of the design space that enables automated, algorithm-driven packing of many projects, replacing manual layout practices. Second, we introduce an architecture that exploits only the narrow-area regions between sites to deliver on off-chip communication and other shared needs. Third, we provide a practical approach for on-chip power domains enabling per-project power characterization at a standard laboratory bench and requiring no expertise in low-power ASIC design. Experimental results show that our approach achieves substantial area reductions of up to 13x over state-of-the-art physical-only aggregation methods, offering a scalable and cost-effective path forward for large-scale tapeout environments.
