Retrofitting Control Flow Graphs in LLVM IR for Auto Vectorization
Shihan Fang, Wenxin Zheng
TL;DR
The paper tackles the limited exploitation of SIMD in current compilers by shifting from CFG-centric analysis to a two-layer IR approach. It introduces SIR to preserve high-level structure and VIR to encode precise instruction dependencies, enabling deeper isomorphism-based vectorization and cross-loop packing. The proposed pipeline, together with a dependence-graph-driven framework and cost-based packing decisions, yields substantial performance gains, with reported speedups up to 53% against LLVM and 58% against GCC on real-world workloads. This IR-based approach promises broader, more extensible automatic vectorization across diverse languages and complex control flows, potentially narrowing the performance gap to hardware-vectorized execution on modern CPUs.
Abstract
Modern processors increasingly rely on SIMD instruction sets, such as AVX and RVV, to significantly enhance parallelism and computational performance. However, production-ready compilers like LLVM and GCC often fail to fully exploit available vectorization opportunities due to disjoint vectorization passes and limited extensibility. Although recent attempts in heuristics and intermediate representation (IR) designs have attempted to address these problems, efficiently simplifying control flow analysis and accurately identifying vectorization opportunities remain challenging tasks. To address these issues, we introduce a novel vectorization pipeline featuring two specialized IR extensions: SIR, which encodes high-level structural information, and VIR, which explicitly represents instruction dependencies through data dependency analysis. Leveraging the detailed dependency information provided by VIR, we develop a flexible and extensible vectorization framework. This approach substantially improves interoperability across vectorization passes and expands the search space for identifying isomorphic instructions, ultimately enhancing both the scope and efficiency of automatic vectorization. Experimental evaluations demonstrate that our proposed vectorization pipeline achieves significant performance improvements, delivering speedups of up to 53% and 58% compared to LLVM and GCC, respectively.
