Table of Contents
Fetching ...

GenAI for Systems: Recurring Challenges and Design Principles from Software to Silicon

Arya Tschand, Chenyu Wang, Zishen Wan, Andrew Cheng, Ioana Cristescu, Kevin He, Howard Huang, Alexander Ingare, Akseli Kangaslahti, Sara Kangaslahti, Theo Lebryk, Hongjin Lin, Jeffrey Jian Ma, Alexandru Meterez, Clara Mohri, Depen Morwani, Sunny Qin, Roy Rinberg, Paula Rodriguez-Diaz, Alyssa Mia Taliotis, Pernille Undrum Fathi, Rosie Zhao, Todd Zhou, Vijay Janapa Reddi

TL;DR

This paper takes a cross-stack perspective, examining how generative models are being applied from code generation and distributed runtimes through hardware design space exploration to RTL synthesis, physical layout, and verification, and argues that the field needs shared engineering methodology, including common vocabularies, cross-layer benchmarks, and systematic design practices.

Abstract

Generative AI is reshaping how computing systems are designed, optimized, and built, yet research remains fragmented across software, architecture, and chip design communities. This paper takes a cross-stack perspective, examining how generative models are being applied from code generation and distributed runtimes through hardware design space exploration to RTL synthesis, physical layout, and verification. Rather than reviewing each layer in isolation, we analyze how the same structural difficulties and effective responses recur across the stack. Our central finding is one of convergence. Despite the diversity of domains and tools, the field keeps encountering five recurring challenges (the feedback loop crisis, the tacit knowledge problem, trust and validation, co-design across boundaries, and the shift from determinism to dynamism) and keeps arriving at five design principles that independently emerge as effective responses (embracing hybrid approaches, designing for continuous feedback, separating concerns by role, matching methods to problem structure, and building on decades of systems knowledge). We organize these into a challenge--principle map that serves as a diagnostic and design aid, showing which principles have proven effective for which challenges across layers. Through concrete cross-stack examples, we show how systems navigate this map as they mature, and argue that the field needs shared engineering methodology, including common vocabularies, cross-layer benchmarks, and systematic design practices, so that progress compounds across communities rather than being rediscovered in each one. Our analysis covers more than 275 papers spanning eleven application areas across three layers of the computing stack, and distills open research questions that become visible only from a cross-layer vantage point.

GenAI for Systems: Recurring Challenges and Design Principles from Software to Silicon

TL;DR

This paper takes a cross-stack perspective, examining how generative models are being applied from code generation and distributed runtimes through hardware design space exploration to RTL synthesis, physical layout, and verification, and argues that the field needs shared engineering methodology, including common vocabularies, cross-layer benchmarks, and systematic design practices.

Abstract

Generative AI is reshaping how computing systems are designed, optimized, and built, yet research remains fragmented across software, architecture, and chip design communities. This paper takes a cross-stack perspective, examining how generative models are being applied from code generation and distributed runtimes through hardware design space exploration to RTL synthesis, physical layout, and verification. Rather than reviewing each layer in isolation, we analyze how the same structural difficulties and effective responses recur across the stack. Our central finding is one of convergence. Despite the diversity of domains and tools, the field keeps encountering five recurring challenges (the feedback loop crisis, the tacit knowledge problem, trust and validation, co-design across boundaries, and the shift from determinism to dynamism) and keeps arriving at five design principles that independently emerge as effective responses (embracing hybrid approaches, designing for continuous feedback, separating concerns by role, matching methods to problem structure, and building on decades of systems knowledge). We organize these into a challenge--principle map that serves as a diagnostic and design aid, showing which principles have proven effective for which challenges across layers. Through concrete cross-stack examples, we show how systems navigate this map as they mature, and argue that the field needs shared engineering methodology, including common vocabularies, cross-layer benchmarks, and systematic design practices, so that progress compounds across communities rather than being rediscovered in each one. Our analysis covers more than 275 papers spanning eleven application areas across three layers of the computing stack, and distills open research questions that become visible only from a cross-layer vantage point.
Paper Structure (90 sections, 14 figures, 2 tables)

This paper contains 90 sections, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Growth of AI for Systems publications on arXiv, based on an analysis of over 7,800 papers (2017--2025). (a) Publication count by domain (log scale), showing a 23$\times$ overall increase. (b) Per-domain growth normalized to 2017 levels: although Software dominates in volume, Hardware (43$\times$) and Chip Design (60$\times$) are rapidly catching up with significantly higher growth rates compared to Software (21$\times$), reflecting the field's broadening from software-centric applications toward the full computing stack.
  • Figure 2: Scope and organization of the survey. We examine generative AI across three layers of the computing stack (software, hardware architecture, and chip design) and synthesize cross-stack challenges, design principles, and open research questions that emerge from a unified analysis.
  • Figure 3: Challenge--principle response map with diagnostic trajectory. Rows correspond to recurring cross-stack challenges (C1--C5) and columns correspond to design principles (P1--P5). Color intensity reflects the relative dominance of each principle as a response to a given challenge, derived from the annotations in Table \ref{['tab:cross_stack_challenges']}. The numbered trajectory illustrates a common maturation pattern: systems often begin with the feedback loop crisis (C1), shift toward trust and validation (C3) as iteration accelerates, and later expose cross-boundary co-design challenges (C4). The map supports diagnosis and iterative navigation rather than prescribing a fixed workflow.
  • Figure 4: Benchmark progression from 2021 to 2025, illustrating the shift from function-level code generation to repository-scale and agent-based evaluation.
  • Figure 5: Large language models approach expert-level performance on isolated optimization tasks but lag on real-world repository-level optimizations. The figure compares prompt-based and fine-tuned models across benchmark categories.
  • ...and 9 more figures