Improved vectorization of OpenCV algorithms for RISC-V CPUs
V. D. Volokitin, E. P. Vasiliev, E. A. Kozinov, V. D. Kustikova, A. V. Liniov, Y. A. Rodimkov, A. V. Sysoyev, I. B. Meyerov
TL;DR
The paper tackles accelerating OpenCV algorithms on RISC-V CPUs by adopting wide-vector universal intrinsics compatible with RVV 0.7.1. By restructuring OpenCV intrinsics to utilize 512-bit vector blocks (4-register vectors) and implementing necessary data-type transformations, the authors demonstrate tens-of-percent performance gains on current RISC-V devices while preserving portability. Through hardware experiments on x86, Mango Pi, and Lichee Pi platforms across image filtering, erosion, and bag-of-words/SVM tasks, the work shows substantial, device-dependent speedups and clarifies where vectorization and parallelization yield the most benefit. The findings suggest that high-performance, open RISC-V hardware can meaningfully close the gap with x86 in CV workloads, motivating continued co-design and optimization efforts for future architectures.
Abstract
The development of an open and free RISC-V architecture is of great interest for a wide range of areas, including high-performance computing and numerical simulation in mathematics, physics, chemistry and other problem domains. In this paper, we discuss the possibilities of accelerating computations on available RISC-V processors by improving the vectorization of several computer vision and machine learning algorithms in the widely used OpenCV library. It is shown that improved vectorization speeds up computations on existing prototypes of RISC-V devices by tens of percent.
