Table of Contents
Fetching ...

MIMONet: Multi-Input Multi-Output On-Device Deep Learning

Zexin Li, Xiaoxi He, Yufei Li, Wei Yang, Lothar Thiele, Cong Liu

TL;DR

This work tackles the challenge of on-device multimodal, multi-output inference by proposing MIMONet, a MIMO DNN framework tailored for embedded robotics. It advances a two-pronged compression strategy: intra-model pruning via an information-bottleneck mechanism extended to ResNet blocks, and inter-model weight sharing via cross-branch MTZ merging, complemented by quantization to fixed-point arithmetic. Empirical results on three NVIDIA Jetson platforms and a PC show substantial improvements in memory, latency, and energy over SISO and MISO baselines, with memory reductions up to $80.7\%$, speedups up to $2.29\times$, and energy savings up to $8.64\times$, while maintaining competitive accuracy on RAVDESS-based multimodal tasks and real-world TurtleBot3 scenarios. The findings demonstrate the practical viability of on-device MIMO inference for robotics, enabling efficient, real-time processing of multiple inputs and generation of multiple outputs in constrained environments.

Abstract

Future intelligent robots are expected to process multiple inputs simultaneously (such as image and audio data) and generate multiple outputs accordingly (such as gender and emotion), similar to humans. Recent research has shown that multi-input single-output (MISO) deep neural networks (DNN) outperform traditional single-input single-output (SISO) models, representing a significant step towards this goal. In this paper, we propose MIMONet, a novel on-device multi-input multi-output (MIMO) DNN framework that achieves high accuracy and on-device efficiency in terms of critical performance metrics such as latency, energy, and memory usage. Leveraging existing SISO model compression techniques, MIMONet develops a new deep-compression method that is specifically tailored to MIMO models. This new method explores unique yet non-trivial properties of the MIMO model, resulting in boosted accuracy and on-device efficiency. Extensive experiments on three embedded platforms commonly used in robotic systems, as well as a case study using the TurtleBot3 robot, demonstrate that MIMONet achieves higher accuracy and superior on-device efficiency compared to state-of-the-art SISO and MISO models, as well as a baseline MIMO model we constructed. Our evaluation highlights the real-world applicability of MIMONet and its potential to significantly enhance the performance of intelligent robotic systems.

MIMONet: Multi-Input Multi-Output On-Device Deep Learning

TL;DR

This work tackles the challenge of on-device multimodal, multi-output inference by proposing MIMONet, a MIMO DNN framework tailored for embedded robotics. It advances a two-pronged compression strategy: intra-model pruning via an information-bottleneck mechanism extended to ResNet blocks, and inter-model weight sharing via cross-branch MTZ merging, complemented by quantization to fixed-point arithmetic. Empirical results on three NVIDIA Jetson platforms and a PC show substantial improvements in memory, latency, and energy over SISO and MISO baselines, with memory reductions up to , speedups up to , and energy savings up to , while maintaining competitive accuracy on RAVDESS-based multimodal tasks and real-world TurtleBot3 scenarios. The findings demonstrate the practical viability of on-device MIMO inference for robotics, enabling efficient, real-time processing of multiple inputs and generation of multiple outputs in constrained environments.

Abstract

Future intelligent robots are expected to process multiple inputs simultaneously (such as image and audio data) and generate multiple outputs accordingly (such as gender and emotion), similar to humans. Recent research has shown that multi-input single-output (MISO) deep neural networks (DNN) outperform traditional single-input single-output (SISO) models, representing a significant step towards this goal. In this paper, we propose MIMONet, a novel on-device multi-input multi-output (MIMO) DNN framework that achieves high accuracy and on-device efficiency in terms of critical performance metrics such as latency, energy, and memory usage. Leveraging existing SISO model compression techniques, MIMONet develops a new deep-compression method that is specifically tailored to MIMO models. This new method explores unique yet non-trivial properties of the MIMO model, resulting in boosted accuracy and on-device efficiency. Extensive experiments on three embedded platforms commonly used in robotic systems, as well as a case study using the TurtleBot3 robot, demonstrate that MIMONet achieves higher accuracy and superior on-device efficiency compared to state-of-the-art SISO and MISO models, as well as a baseline MIMO model we constructed. Our evaluation highlights the real-world applicability of MIMONet and its potential to significantly enhance the performance of intelligent robotic systems.
Paper Structure (17 sections, 4 equations, 3 figures, 2 tables)

This paper contains 17 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of MIMONet. In the left part, gray circles represent neurons inducing intra-model redundancy. In the right part, green circles denote sharable neurons inducing inter-model redundancy. Best viewed in color.
  • Figure 2: Design for compression of the residual block for ResNet ResNet. The left side shows the structure of the residual block. The right side shows channel-level pruning and recovery. White and gray circles exhibit kept pruned channels. The pruned channels are filled with zeros before the feature map summing. Best viewed in color.
  • Figure 3: Data examples of RAVDESS dataset RAVDESS.