KernelFoundry: Hardware-aware evolutionary GPU kernel optimization

Nina Wiedemann; Quentin Leboutet; Michael Paulitsch; Diana Wofk; Benjamin Ummenhofer

KernelFoundry: Hardware-aware evolutionary GPU kernel optimization

Nina Wiedemann, Quentin Leboutet, Michael Paulitsch, Diana Wofk, Benjamin Ummenhofer

Abstract

Optimizing GPU kernels presents a significantly greater challenge for large language models (LLMs) than standard code generation tasks, as it requires understanding hardware architecture, parallel optimization strategies, and performance profiling outputs. Most existing LLM-based approaches to kernel generation rely on simple prompting and feedback loops, incorporating hardware awareness only indirectly through profiling feedback. We introduce KernelFoundry, an evolutionary framework that efficiently explores the GPU kernel design space through three key mechanisms: (1) MAP-Elites quality-diversity search with kernel-specific behavioral dimensions to sustain exploration across diverse optimization strategies; (2) meta-prompt evolution, which co-evolves prompts with kernels to uncover task-specific optimization strategies, and (3) template-based parameter optimization to tune kernels to inputs and hardware. We evaluate this framework on KernelBench, robust-kbench, and custom tasks, generating SYCL kernels as a cross-platform GPU programming model and CUDA kernels for comparison to prior work. Our approach consistently outperforms the baseline methods, achieving an average speedup of 2.3x on KernelBench for SYCL. Moreover, KernelFoundry is implemented as a distributed framework with remote access to diverse hardware, enabling rapid benchmarking and featuring a flexible user input layer that supports kernel generation for a wide range of real-world use cases beyond benchmarking.

KernelFoundry: Hardware-aware evolutionary GPU kernel optimization

Abstract

Paper Structure (50 sections, 5 equations, 4 figures, 11 tables)

This paper contains 50 sections, 5 equations, 4 figures, 11 tables.

Introduction
Related Work
Traditional Kernel Optimization.
LLM-Based Kernel Generation.
LLM-Guided Evolutionary Search.
Method
System Architecture
Quality-Diversity Search with MAP-Elites
MAP-Elites Algorithm.
Kernel-Specific Behavioral Descriptors.
Fitness Function.
Selection Strategies.
Gradient-Informed Evolution
Transition Tracking and gradient estimation.
Gradient-to-Prompt Translation.
...and 35 more sections

Figures (4)

Figure 1: The KernelFoundry pipeline: evolutionary kernel optimization with multi-level feedback and meta-prompt co-evolution.
Figure 2: Gradient-informed MAP-Elites for kernel optimization. The archive partitions kernels by behavioral coordinates $(d_\text{mem}, d_\text{algo}, d_\text{sync})$. Elites are shown in yellow. A Transition Tracker records parent$\to$child transitions with behavioral coordinates and fitness deltas. The Gradient Estimator combines fitness gradients ($\nabla F$), improvement-rate gradients ($\nabla R$), and exploration gradients ($\nabla E$) to produce sampling weights and natural-language mutation hints that guide subsequent generations.
Figure 3: Improvement over iterations (cumulative best)
Figure 4: Overview of the KernelFoundry infrastructure and the custom task format

KernelFoundry: Hardware-aware evolutionary GPU kernel optimization

Abstract

KernelFoundry: Hardware-aware evolutionary GPU kernel optimization

Authors

Abstract

Table of Contents

Figures (4)