Equivalence Checking of ML GPU Kernels

Kshitij Dubey; Benjamin Driscoll; Anjiang Wei; Neeraj Kayal; Rahul Sharma; Alex Aiken

Equivalence Checking of ML GPU Kernels

Kshitij Dubey, Benjamin Driscoll, Anjiang Wei, Neeraj Kayal, Rahul Sharma, Alex Aiken

TL;DR

Volta addresses the challenge of formally verifying the semantic equivalence of ML GPU kernels produced by humans, LLMs, or compilers. It introduces a PTX-level, symbolically-executed equivalence checker for structured-CTAs, proving soundness and completeness within this kernel class via a confluence property that ensures schedule-independence. The approach supports data-race and deadlock detection, and a decision procedure for equalities involving additions, multiplications, and exponentials, including a proof-of-decidability for identities of the form $\sum_i p_i(\mathbf{x}) e^{h_i(\mathbf{x})} = 0$. Empirically, Volta verifies reductions, MatMul, convolutions, and attention across human-, LLM-, and compiler-generated kernels, completing within minutes and uncovering correctness bugs and data-race issues, thereby enabling reliable optimization of ML workloads on GPUs with tensor cores.

Abstract

With the rapid progress of deep learning and large language models (LLMs), companies now spend enormous sums executing GPU kernels. These kernels have, therefore, become prime targets for aggressive optimization. Recent efforts increasingly leverage LLMs to generate GPU kernels, but make no formal guarantees about the generated kernels. We present the first equivalence checker for GPU kernels and use it to formally verify the correctness of machine learning (ML) kernels optimized by hand, by LLMs, and by compilers. We show that our equivalence checker is sound and, for a well-defined class of GPU kernels which includes the programs of interest, complete. Our implementation, VOLTA, can verify ML computations such as convolutions, matrix multiplications, and various attention mechanisms.

Equivalence Checking of ML GPU Kernels

TL;DR

Abstract

Equivalence Checking of ML GPU Kernels

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (12)