LLM-Vectorizer: LLM-based Verified Loop Vectorizer
Jubi Taneja, Avery Laird, Cong Yan, Madan Musuvathi, Shuvendu K. Lahiri
TL;DR
This work tackles the difficulty of automatic loop vectorization by leveraging large-language models (LLMs) to generate vectorized AVX2 code from scalar programs. It introduces LLM-Vectorizer, a multi-agent finite-state-machine system that uses checksum-based testing for plausibility and Alive2 bounded translation validation for formal correctness, with domain-specific optimizations to improve verification scalability. Across 149 TSVC test cases, the approach yields up to 125 plausible vectorizations and formally verifies 57 as equivalent (38.2%), while demonstrating run-time speedups up to 9.4x over state-of-the-art compilers. The study highlights both the potential and challenges of combining LLMs with formal verification to democratize high-performance vectorization, and suggests directions for reducing verification costs and extending guarantees.
Abstract
Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that demands deep knowledge of specific architecture and compilers. In this paper, we evaluate the potential of large-language models (LLMs) to generate vectorized (Single Instruction Multiple Data) code from scalar programs that process individual array elements. We propose a novel finite-state machine multi-agents based approach that harnesses LLMs and test-based feedback to generate vectorized code. Our findings indicate that LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x as compared to the state-of-the-art compilers such as Intel Compiler, GCC, and Clang. To verify the correctness of vectorized code, we use Alive2, a leading bounded translation validation tool for LLVM IR. We describe a few domain-specific techniques to improve the scalability of Alive2 on our benchmark dataset. Overall, our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.
