Table of Contents
Fetching ...

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Shangzhan Li, Zefan Wang, Ye He, Yuxuan Li, Qi Shi, Jianling Li, Yonggang Hu, Wanxiang Che, Xu Han, Zhiyuan Liu, Maosong Sun

TL;DR

AutoTriton introduces the first Triton-dedicated model trained with supervised fine-tuning and reinforcement learning to automate high-performance kernel programming. A novel data-gathering pipeline produces high-quality instruction–Triton datasets, while GRPO-based RL optimizes kernels using a combined rule-based and execution-based reward to curb reward hacking. Across TritonBench and KernelBench, AutoTriton 8B matches or approaches several mainstream large models, with RL contributing substantial gains over SFT alone. The work highlights the effectiveness of RL and curated data in specialized code generation, and points to future work on performance-guided training to further close the gap to hand-tuned kernels.

Abstract

Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and memory access patterns through iterative experimentation, creating substantial barriers to optimal performance and wider adoption. In this work, we introduce AutoTriton, the first model dedicated to Triton programming powered by reinforcement learning (RL). AutoTriton performs supervised fine-tuning (SFT) to be equipped with essential Triton programming expertise using a high-quality data gathering pipeline, and conducts RL with Group Relative Policy Optimization (GRPO) algorithm, combining a rule-based reward and an execution-based reward to further improve Triton programming ability, sequentially. Experiments across five evaluation channels of TritonBench and KernelBench illustrate that our 8B model AutoTriton achieves performance comparable to mainstream large models, including Claude-4-Sonnet and DeepSeek-R1-0528. Further experimental analysis demonstrates the crucial role of each module within AutoTriton, including the SFT stage, the RL stage, and the reward design strategy. These findings underscore the promise of RL for automatically generating high-performance kernels, and since high-performance kernels are core components of AI systems, this breakthrough establishes an important foundation for building more efficient AI systems. The model and code will be available at https://github.com/AI9Stars/AutoTriton.

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

TL;DR

AutoTriton introduces the first Triton-dedicated model trained with supervised fine-tuning and reinforcement learning to automate high-performance kernel programming. A novel data-gathering pipeline produces high-quality instruction–Triton datasets, while GRPO-based RL optimizes kernels using a combined rule-based and execution-based reward to curb reward hacking. Across TritonBench and KernelBench, AutoTriton 8B matches or approaches several mainstream large models, with RL contributing substantial gains over SFT alone. The work highlights the effectiveness of RL and curated data in specialized code generation, and points to future work on performance-guided training to further close the gap to hand-tuned kernels.

Abstract

Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and memory access patterns through iterative experimentation, creating substantial barriers to optimal performance and wider adoption. In this work, we introduce AutoTriton, the first model dedicated to Triton programming powered by reinforcement learning (RL). AutoTriton performs supervised fine-tuning (SFT) to be equipped with essential Triton programming expertise using a high-quality data gathering pipeline, and conducts RL with Group Relative Policy Optimization (GRPO) algorithm, combining a rule-based reward and an execution-based reward to further improve Triton programming ability, sequentially. Experiments across five evaluation channels of TritonBench and KernelBench illustrate that our 8B model AutoTriton achieves performance comparable to mainstream large models, including Claude-4-Sonnet and DeepSeek-R1-0528. Further experimental analysis demonstrates the crucial role of each module within AutoTriton, including the SFT stage, the RL stage, and the reward design strategy. These findings underscore the promise of RL for automatically generating high-performance kernels, and since high-performance kernels are core components of AI systems, this breakthrough establishes an important foundation for building more efficient AI systems. The model and code will be available at https://github.com/AI9Stars/AutoTriton.

Paper Structure

This paper contains 22 sections, 4 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of AutoTriton pipeline. The entire pipeline consists of three components: data collection, SFT stage, and RL stage.
  • Figure 2: Data gathering pipeline of AutoTriton. Our pipeline begins with the systematic collection of PyTorch kernels, then generates corresponding Triton kernels by instruction-guided LLM distillation and compilation with LLM enhanced refinement simultaneously.
  • Figure 3: Example of the phenomenon of the low-quality implementation of Triton code.
  • Figure 4: Reward scores of AutoTriton and AutoTriton w/o SFT stage.
  • Figure 5: AutoTriton prompts for experimental reasoning.