Table of Contents
Fetching ...

LLM-Vectorizer: LLM-based Verified Loop Vectorizer

Jubi Taneja, Avery Laird, Cong Yan, Madan Musuvathi, Shuvendu K. Lahiri

TL;DR

This work tackles the difficulty of automatic loop vectorization by leveraging large-language models (LLMs) to generate vectorized AVX2 code from scalar programs. It introduces LLM-Vectorizer, a multi-agent finite-state-machine system that uses checksum-based testing for plausibility and Alive2 bounded translation validation for formal correctness, with domain-specific optimizations to improve verification scalability. Across 149 TSVC test cases, the approach yields up to 125 plausible vectorizations and formally verifies 57 as equivalent (38.2%), while demonstrating run-time speedups up to 9.4x over state-of-the-art compilers. The study highlights both the potential and challenges of combining LLMs with formal verification to democratize high-performance vectorization, and suggests directions for reducing verification costs and extending guarantees.

Abstract

Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that demands deep knowledge of specific architecture and compilers. In this paper, we evaluate the potential of large-language models (LLMs) to generate vectorized (Single Instruction Multiple Data) code from scalar programs that process individual array elements. We propose a novel finite-state machine multi-agents based approach that harnesses LLMs and test-based feedback to generate vectorized code. Our findings indicate that LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x as compared to the state-of-the-art compilers such as Intel Compiler, GCC, and Clang. To verify the correctness of vectorized code, we use Alive2, a leading bounded translation validation tool for LLVM IR. We describe a few domain-specific techniques to improve the scalability of Alive2 on our benchmark dataset. Overall, our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.

LLM-Vectorizer: LLM-based Verified Loop Vectorizer

TL;DR

This work tackles the difficulty of automatic loop vectorization by leveraging large-language models (LLMs) to generate vectorized AVX2 code from scalar programs. It introduces LLM-Vectorizer, a multi-agent finite-state-machine system that uses checksum-based testing for plausibility and Alive2 bounded translation validation for formal correctness, with domain-specific optimizations to improve verification scalability. Across 149 TSVC test cases, the approach yields up to 125 plausible vectorizations and formally verifies 57 as equivalent (38.2%), while demonstrating run-time speedups up to 9.4x over state-of-the-art compilers. The study highlights both the potential and challenges of combining LLMs with formal verification to democratize high-performance vectorization, and suggests directions for reducing verification costs and extending guarantees.

Abstract

Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that demands deep knowledge of specific architecture and compilers. In this paper, we evaluate the potential of large-language models (LLMs) to generate vectorized (Single Instruction Multiple Data) code from scalar programs that process individual array elements. We propose a novel finite-state machine multi-agents based approach that harnesses LLMs and test-based feedback to generate vectorized code. Our findings indicate that LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x as compared to the state-of-the-art compilers such as Intel Compiler, GCC, and Clang. To verify the correctness of vectorized code, we use Alive2, a leading bounded translation validation tool for LLVM IR. We describe a few domain-specific techniques to improve the scalability of Alive2 on our benchmark dataset. Overall, our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.
Paper Structure (38 sections, 6 figures, 3 tables, 1 algorithm)

This paper contains 38 sections, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: An example from TSVC benchmark that cannot be vectorized by the state-of-the-art compilers (GCC, Clang, ICC) but by GPT-4.
  • Figure 2: High-level design overview of LLM-Vectorizer.
  • Figure 3: Detailed design of LLM-Vectorizer.
  • Figure 4: Motivating example for Symbolic Verification. In vectorized code (b), __mm256_blendv_epi8 blends bytes from two $256$-bit integer vectors based on the most significant bit of each byte in a mask vector.
  • Figure 5: Effectiveness of LLM-Vectorizer at generating correct vectorized code at different levels of $k$.
  • ...and 1 more figures