IntrinTrans: LLM-based Intrinsic Code Translator for RISC-V Vector
Liutong Han, Zhiyuan Tan, Hongbin Zhang, Pengcheng Wang, Chu Kang, Mingjie Xing, Yanjun Wu
TL;DR
IntrinTrans addresses the challenge of porting vectorized code across architectures to the RISC-V Vector (RVV) by leveraging a large-language-model–driven multi-agent framework. It combines an RVV code translator, a compilation executor, a test executor, and an optimizer within a finite-state-machine–driven feedback loop, augmented by liveness-analysis–based register-pressure insights. The approach is evaluated on 34 Neon→RVV cases across 21 LLMs, showing that advanced models can produce functionally correct RVV intrinsics in many cases and that some translations achieve substantial speedups (up to $5.93\times$) over native implementations. The work demonstrates a promising direction for automated cross-ISA intrinsic translation, with implications for accelerating RVV adoption and performance portability across hardware platforms.
Abstract
The use of intrinsic functions to exploit hardware-specific capabilities is an important approach for optimizing library performance. Many mainstream libraries implement a large number of vectorized algorithms on Arm or x86 SIMD intrinsic functions. With the rapid expansion of the RISC-V hardware-software ecosystem, there is a growing demand for support of the RISC-V Vector (RVV) extension. Translating existing vectorized intrinsic code onto RVV intrinsics is a practical and effective approach. However, current cross-architecture translation largely relies on manual rewriting, which is time-consuming and error-prone. Furthermore, while some rule-based methods can reduce the need for manual intervention, their translation success rate is limited by incomplete rule coverage and syntactic constraints, and the performance suffers from inadequate utilization of RVV-specific features. We present IntrinTrans, a LLM-based multi-agent approach that utilizes compile-and-test feedback to translate intrinsic code across architectures automatically, and further optimizes the generated RVV intrinsics using register-usage information derived from liveness analysis. To evaluate the effectiveness of our approach, we collected 34 vectorized algorithm cases from open-source libraries. Each case includes an Arm Neon intrinsics implementation and a RVV intrinsics implementation contributed by the open-source community, together with correctness and performance tests. Our experiments show that advanced LLMs produce semantically correct RISC-V Vector intrinsics in most cases within a limited number of iterations, and in some cases achieve up to 5.93x the performance of the native implementation from the open-source community.
