KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Qitong Sun; Jun Han; Tianlin Li; Zhe Tang; Sheng Chen; Fei Yang; Aishan Liu; Xianglong Liu; Yang Liu

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Qitong Sun, Jun Han, Tianlin Li, Zhe Tang, Sheng Chen, Fei Yang, Aishan Liu, Xianglong Liu, Yang Liu

TL;DR

KernelSkill, a multi-agent framework with a dual-level memory architecture that operates by coordinating agents with long-term memory of reusable expert skills and short-term memory to prevent repetitive backtracking, is presented.

Abstract

Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque, implicitly learned heuristics within the LLMs to determine optimization strategies. This leads to inefficient trial-and-error and weakly interpretable optimizations. Our key insight is to replace implicit heuristics with expert optimization skills that are knowledge-driven and aware of task trajectories. Specifically, we present KernelSkill, a multi-agent framework with a dual-level memory architecture. KernelSkill operates by coordinating agents with long-term memory of reusable expert skills and short-term memory to prevent repetitive backtracking. On KernelBench Levels 1-3, KernelSkill achieves a 100% success rate and average speedups of 5.44x, 2.82x, and 1.92x over Torch Eager on Levels 1, 2, and 3, respectively, outperforming prior baselines. Code is available at https://github.com/0satan0/KernelMem/.

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

TL;DR

Abstract

Paper Structure (36 sections, 4 figures, 5 tables)

This paper contains 36 sections, 4 figures, 5 tables.

Introduction
Related Works
Compilers and Autotuning
Training-Based LLM Methods
Agentic Optimization
Motivation
Motivating example: imprecise method selection.
Design principles and KernelSkill.
KernelSkill
Multi-Agent Optimization
Framework Overview
Generator
Feature Extractor
Reviewer
Diagnoser
...and 21 more sections

Figures (4)

Figure 1: Overview of KernelSkill.
Figure 2: The short-term memory for the current repair round.
Figure 3: The short-term memory for the current optimization round.
Figure 4: Retrieval method and generate plan.

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

TL;DR

Abstract

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (4)