Table of Contents
Fetching ...

Retrofitting Control Flow Graphs in LLVM IR for Auto Vectorization

Shihan Fang, Wenxin Zheng

TL;DR

The paper tackles the limited exploitation of SIMD in current compilers by shifting from CFG-centric analysis to a two-layer IR approach. It introduces SIR to preserve high-level structure and VIR to encode precise instruction dependencies, enabling deeper isomorphism-based vectorization and cross-loop packing. The proposed pipeline, together with a dependence-graph-driven framework and cost-based packing decisions, yields substantial performance gains, with reported speedups up to 53% against LLVM and 58% against GCC on real-world workloads. This IR-based approach promises broader, more extensible automatic vectorization across diverse languages and complex control flows, potentially narrowing the performance gap to hardware-vectorized execution on modern CPUs.

Abstract

Modern processors increasingly rely on SIMD instruction sets, such as AVX and RVV, to significantly enhance parallelism and computational performance. However, production-ready compilers like LLVM and GCC often fail to fully exploit available vectorization opportunities due to disjoint vectorization passes and limited extensibility. Although recent attempts in heuristics and intermediate representation (IR) designs have attempted to address these problems, efficiently simplifying control flow analysis and accurately identifying vectorization opportunities remain challenging tasks. To address these issues, we introduce a novel vectorization pipeline featuring two specialized IR extensions: SIR, which encodes high-level structural information, and VIR, which explicitly represents instruction dependencies through data dependency analysis. Leveraging the detailed dependency information provided by VIR, we develop a flexible and extensible vectorization framework. This approach substantially improves interoperability across vectorization passes and expands the search space for identifying isomorphic instructions, ultimately enhancing both the scope and efficiency of automatic vectorization. Experimental evaluations demonstrate that our proposed vectorization pipeline achieves significant performance improvements, delivering speedups of up to 53% and 58% compared to LLVM and GCC, respectively.

Retrofitting Control Flow Graphs in LLVM IR for Auto Vectorization

TL;DR

The paper tackles the limited exploitation of SIMD in current compilers by shifting from CFG-centric analysis to a two-layer IR approach. It introduces SIR to preserve high-level structure and VIR to encode precise instruction dependencies, enabling deeper isomorphism-based vectorization and cross-loop packing. The proposed pipeline, together with a dependence-graph-driven framework and cost-based packing decisions, yields substantial performance gains, with reported speedups up to 53% against LLVM and 58% against GCC on real-world workloads. This IR-based approach promises broader, more extensible automatic vectorization across diverse languages and complex control flows, potentially narrowing the performance gap to hardware-vectorized execution on modern CPUs.

Abstract

Modern processors increasingly rely on SIMD instruction sets, such as AVX and RVV, to significantly enhance parallelism and computational performance. However, production-ready compilers like LLVM and GCC often fail to fully exploit available vectorization opportunities due to disjoint vectorization passes and limited extensibility. Although recent attempts in heuristics and intermediate representation (IR) designs have attempted to address these problems, efficiently simplifying control flow analysis and accurately identifying vectorization opportunities remain challenging tasks. To address these issues, we introduce a novel vectorization pipeline featuring two specialized IR extensions: SIR, which encodes high-level structural information, and VIR, which explicitly represents instruction dependencies through data dependency analysis. Leveraging the detailed dependency information provided by VIR, we develop a flexible and extensible vectorization framework. This approach substantially improves interoperability across vectorization passes and expands the search space for identifying isomorphic instructions, ultimately enhancing both the scope and efficiency of automatic vectorization. Experimental evaluations demonstrate that our proposed vectorization pipeline achieves significant performance improvements, delivering speedups of up to 53% and 58% compared to LLVM and GCC, respectively.

Paper Structure

This paper contains 39 sections, 2 equations, 12 figures, 1 algorithm.

Figures (12)

  • Figure 1: An example of code containing control-flow-equivalent chen2022all basic blocks—two blocks execute under identical conditions. The pink basic blocks are control flow equivalent and the two load instructions in these two blocks can be vectorized. The conditional addition on the loaded values can also be vectorized using masked addition, where each element operation executes only if the corresponding mask element is true.
  • Figure 2: An example of code with interleaved access to the same array by processing even and odd indices separately. After applying loop fusion followed by unrolling, the two loops can be fully vectorized.
  • Figure 3: Compilation pipeline.
  • Figure 4: Translated SIR of the first loop in Figure\ref{['fig:motiv:fusion-unroll']}(a). (a) is a sub-tree representing the loop rooted at a Loop Wrapper and a Block after the loop. (b) is the corresponding CFG of the Blocks in (a). And (c) presents some of the important information we extracted from source code.
  • Figure 5: Translated VIR of the first loop in Figure \ref{['fig:motiv:fusion-unroll']}(a).
  • ...and 7 more figures