MIMONet: Multi-Input Multi-Output On-Device Deep Learning
Zexin Li, Xiaoxi He, Yufei Li, Wei Yang, Lothar Thiele, Cong Liu
TL;DR
This work tackles the challenge of on-device multimodal, multi-output inference by proposing MIMONet, a MIMO DNN framework tailored for embedded robotics. It advances a two-pronged compression strategy: intra-model pruning via an information-bottleneck mechanism extended to ResNet blocks, and inter-model weight sharing via cross-branch MTZ merging, complemented by quantization to fixed-point arithmetic. Empirical results on three NVIDIA Jetson platforms and a PC show substantial improvements in memory, latency, and energy over SISO and MISO baselines, with memory reductions up to $80.7\%$, speedups up to $2.29\times$, and energy savings up to $8.64\times$, while maintaining competitive accuracy on RAVDESS-based multimodal tasks and real-world TurtleBot3 scenarios. The findings demonstrate the practical viability of on-device MIMO inference for robotics, enabling efficient, real-time processing of multiple inputs and generation of multiple outputs in constrained environments.
Abstract
Future intelligent robots are expected to process multiple inputs simultaneously (such as image and audio data) and generate multiple outputs accordingly (such as gender and emotion), similar to humans. Recent research has shown that multi-input single-output (MISO) deep neural networks (DNN) outperform traditional single-input single-output (SISO) models, representing a significant step towards this goal. In this paper, we propose MIMONet, a novel on-device multi-input multi-output (MIMO) DNN framework that achieves high accuracy and on-device efficiency in terms of critical performance metrics such as latency, energy, and memory usage. Leveraging existing SISO model compression techniques, MIMONet develops a new deep-compression method that is specifically tailored to MIMO models. This new method explores unique yet non-trivial properties of the MIMO model, resulting in boosted accuracy and on-device efficiency. Extensive experiments on three embedded platforms commonly used in robotic systems, as well as a case study using the TurtleBot3 robot, demonstrate that MIMONet achieves higher accuracy and superior on-device efficiency compared to state-of-the-art SISO and MISO models, as well as a baseline MIMO model we constructed. Our evaluation highlights the real-world applicability of MIMONet and its potential to significantly enhance the performance of intelligent robotic systems.
