Optimization of a Line Detection Algorithm for Autonomous Vehicles on a RISC-V with Accelerator
María José Belda, Katzalin Olcoz, Fernando Castro, Francisco Tirado
TL;DR
The paper addresses real-time line-detection for autonomous vehicles under onboard power and latency constraints. It proposes a heterogeneous architecture that pairs a RISC-V core (Rocket or BOOM) with the Gemmini matrix-multiplication accelerator, and systematically adapts the Canny edge detector and Hough transform to offload compute-intensive tasks while keeping overall latency in check. The main findings show up to 3.7x speedup over a baseline CPU-only configuration, with BOOM cores achieving the best performance and accelerator-enabled designs enabling lower clock frequencies for energy savings. The work demonstrates the practicality of leveraging RISC-V ecosystems (Chipyard, FireSim, AWS) to prototype and evaluate vision workloads for autonomous vehicles, and suggests future exploration of larger-matrix workloads and neural-network inference on Gemmini.
Abstract
In recent years, autonomous vehicles have attracted the attention of many research groups, both in academia and business, including researchers from leading companies such as Google, Uber and Tesla. This type of vehicles are equipped with systems that are subject to very strict requirements, essentially aimed at performing safe operations -- both for potential passengers and pedestrians -- as well as carrying out the processing needed for decision making in real time. In many instances, general-purpose processors alone cannot ensure that these safety, reliability and real-time requirements are met, so it is common to implement heterogeneous systems by including accelerators. This paper explores the acceleration of a line detection application in the autonomous car environment using a heterogeneous system consisting of a general-purpose RISC-V core and a domain-specific accelerator. In particular, the application is analyzed to identify the most computationally intensive parts of the code and it is adapted accordingly for more efficient processing. Furthermore, the code is executed on the aforementioned hardware platform to verify that the execution effectively meets the existing requirements in autonomous vehicles, experiencing a 3.7x speedup with respect to running without accelerator.
