Table of Contents
Fetching ...

Priority Sampling of Large Language Models for Compilers

Dejan Grubisic, Chris Cummins, Volker Seeker, Hugh Leather

TL;DR

This work tackles the inefficiency and limited diversity of temperature-based sampling for large language models in code optimization tasks. It introduces Priority Sampling, a deterministic method that builds an augmented search tree and maintains a priority queue of top tokens with prefixes, expanding the most promising path while constraining generations to a user-specified regular expression; the approach yields unique samples with complexity $O(T*(inference + K \log(V)))$ and reduced memory through a fixed-size queue. On LLVM optimization passes, a 7B-parameter Llama2 model trained with autotuner labels demonstrated that Priority Sampling outperforms Nucleus Sampling for any sample count and achieves 91% of autotuner improvement in just 5 samples, reaching autotuner-level performance in 30 samples, with about a 4.98% improvement over -Oz on the unseen set. The results suggest that LLMs store actionable compiler-optimization knowledge that can be accessed more effectively through structured search and regex-guided generation, reducing the reliance on task-specific tuning and autotuning pipelines.

Abstract

Large language models show great potential in generating and optimizing code. Widely used sampling methods such as Nucleus Sampling increase the diversity of generation but often produce repeated samples for low temperatures and incoherent samples for high temperatures. Furthermore, the temperature coefficient has to be tuned for each task, limiting its usability. We present Priority Sampling, a simple and deterministic sampling technique that produces unique samples ordered by the model's confidence. Each new sample expands the unexpanded token with the highest probability in the augmented search tree. Additionally, Priority Sampling supports generation based on regular expression that provides a controllable and structured exploration process. Priority Sampling outperforms Nucleus Sampling for any number of samples, boosting the performance of the original model from 2.87% to 5% improvement over -Oz. Moreover, it outperforms the autotuner used for the generation of labels for the training of the original model in just 30 samples.

Priority Sampling of Large Language Models for Compilers

TL;DR

This work tackles the inefficiency and limited diversity of temperature-based sampling for large language models in code optimization tasks. It introduces Priority Sampling, a deterministic method that builds an augmented search tree and maintains a priority queue of top tokens with prefixes, expanding the most promising path while constraining generations to a user-specified regular expression; the approach yields unique samples with complexity and reduced memory through a fixed-size queue. On LLVM optimization passes, a 7B-parameter Llama2 model trained with autotuner labels demonstrated that Priority Sampling outperforms Nucleus Sampling for any sample count and achieves 91% of autotuner improvement in just 5 samples, reaching autotuner-level performance in 30 samples, with about a 4.98% improvement over -Oz on the unseen set. The results suggest that LLMs store actionable compiler-optimization knowledge that can be accessed more effectively through structured search and regex-guided generation, reducing the reliance on task-specific tuning and autotuning pipelines.

Abstract

Large language models show great potential in generating and optimizing code. Widely used sampling methods such as Nucleus Sampling increase the diversity of generation but often produce repeated samples for low temperatures and incoherent samples for high temperatures. Furthermore, the temperature coefficient has to be tuned for each task, limiting its usability. We present Priority Sampling, a simple and deterministic sampling technique that produces unique samples ordered by the model's confidence. Each new sample expands the unexpanded token with the highest probability in the augmented search tree. Additionally, Priority Sampling supports generation based on regular expression that provides a controllable and structured exploration process. Priority Sampling outperforms Nucleus Sampling for any number of samples, boosting the performance of the original model from 2.87% to 5% improvement over -Oz. Moreover, it outperforms the autotuner used for the generation of labels for the training of the original model in just 30 samples.
Paper Structure (7 sections, 3 figures, 1 table, 1 algorithm)

This paper contains 7 sections, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Average number of unique samples generated from 50k unseen test programs. Priority sampling produces a higher ratio of unique samples than nucleus sampling.
  • Figure 2: Priority Sampling tree expansion. Each node contains a token generated by inference and the probabilities of the next potential tokens. In the first sample, we create a branch from the root to the end-of-sequence (EOS) token and put all valid potential tokens with their probabilities in the priority queue. For every next step, branch the token that had the highest probability and generate that branch until the EOS.
  • Figure 3: Average improvement in code size over -Oz optimization on 50k unseen test examples. Autotuner spends 760s for optimizing each example and sets the labels for LLM fine-tuning cummins2023large. Greedy Decoding, Nucleus Sampling, and Priority Sampling utilize the fine-tuned model. Random Sampling selects 100 random flags for each sample. Priority Sampling outperforms all previous methods including autotuner which was used for labeling.