Characterizing and Optimizing Real-Time Optimal Control for Embedded SoCs
Kris Shengjun Dong, Dima Nikiforov, Widyadewi Soedarmadji, Minh Nguyen, Vikram Jain, Christopher W. Fletcher, Yakun Sophia Shao
TL;DR
The paper tackles the challenge of running real-time model predictive control (MPC) on resource-constrained embedded SoCs by performing a rigorous design-space exploration across scalar RISC-V CPUs, RVV-enabled vector cores, and Gemmini-style systolic accelerators. It pairs kernel-level optimizations (loop unrolling, operator fusion, static scheduling, scratchpad data flow) with a code-generation flow (matlib) to automate fused, tiled implementations, achieving up to $3.71\times$ speedups and up to $27\%$ system-power reductions in end-to-end robotic tasks. Hardware-in-the-loop validation on a fabricated Cygnus RVV SoC with a CrazyFlie drone demonstrates real-time MPC viability and shows how architecture choice interacts with workload characteristics and SWaP constraints. The work provides concrete design guidance and tooling for deploying real-time robot control on heterogeneous embedded SoCs, highlighting the substantial gains from software-centric optimizations and the resulting reduction in engineering overhead via code generation.
Abstract
Resource-limited robots face significant challenges in executing computationally intensive tasks, such as locomotion and manipulation, particularly for real-time optimal control algorithms like Model Predictive Control (MPC). This paper provides a comprehensive design space exploration to identify optimal hardware computation architectures for these demanding model-based control algorithms. We profile and optimize representative architectural designs, including general-purpose scalar CPUs, vector processors, and specialized accelerators. By characterizing kernel-level benchmarks and end-to-end robotic scenarios, including a hardware-in-the-loop evaluation on a fabricated RISC-V multi-core vector SoC, we present a quantitative comparison of performance, area, and utilization across distinct architectural design points. Our findings demonstrate that targeted architectural modifications, coupled with deep software and system optimizations, enable up to 3.71x speedups for MPC, resulting in up to 27% system-level power reductions while completing robotic tasks. Finally, we propose a code generation flow designed to simplify the complex engineering effort required for mapping robotic workloads onto specialized architectures.
