MIMONet: Multi-Input Multi-Output On-Device Deep Learning

Zexin Li; Xiaoxi He; Yufei Li; Wei Yang; Lothar Thiele; Cong Liu

MIMONet: Multi-Input Multi-Output On-Device Deep Learning

Zexin Li, Xiaoxi He, Yufei Li, Wei Yang, Lothar Thiele, Cong Liu

TL;DR

This work tackles the challenge of on-device multimodal, multi-output inference by proposing MIMONet, a MIMO DNN framework tailored for embedded robotics. It advances a two-pronged compression strategy: intra-model pruning via an information-bottleneck mechanism extended to ResNet blocks, and inter-model weight sharing via cross-branch MTZ merging, complemented by quantization to fixed-point arithmetic. Empirical results on three NVIDIA Jetson platforms and a PC show substantial improvements in memory, latency, and energy over SISO and MISO baselines, with memory reductions up to $80.7\%$, speedups up to $2.29\times$, and energy savings up to $8.64\times$, while maintaining competitive accuracy on RAVDESS-based multimodal tasks and real-world TurtleBot3 scenarios. The findings demonstrate the practical viability of on-device MIMO inference for robotics, enabling efficient, real-time processing of multiple inputs and generation of multiple outputs in constrained environments.

Abstract

Future intelligent robots are expected to process multiple inputs simultaneously (such as image and audio data) and generate multiple outputs accordingly (such as gender and emotion), similar to humans. Recent research has shown that multi-input single-output (MISO) deep neural networks (DNN) outperform traditional single-input single-output (SISO) models, representing a significant step towards this goal. In this paper, we propose MIMONet, a novel on-device multi-input multi-output (MIMO) DNN framework that achieves high accuracy and on-device efficiency in terms of critical performance metrics such as latency, energy, and memory usage. Leveraging existing SISO model compression techniques, MIMONet develops a new deep-compression method that is specifically tailored to MIMO models. This new method explores unique yet non-trivial properties of the MIMO model, resulting in boosted accuracy and on-device efficiency. Extensive experiments on three embedded platforms commonly used in robotic systems, as well as a case study using the TurtleBot3 robot, demonstrate that MIMONet achieves higher accuracy and superior on-device efficiency compared to state-of-the-art SISO and MISO models, as well as a baseline MIMO model we constructed. Our evaluation highlights the real-world applicability of MIMONet and its potential to significantly enhance the performance of intelligent robotic systems.

MIMONet: Multi-Input Multi-Output On-Device Deep Learning

TL;DR

, speedups up to

, and energy savings up to

, while maintaining competitive accuracy on RAVDESS-based multimodal tasks and real-world TurtleBot3 scenarios. The findings demonstrate the practical viability of on-device MIMO inference for robotics, enabling efficient, real-time processing of multiple inputs and generation of multiple outputs in constrained environments.

Abstract

Paper Structure (17 sections, 4 equations, 3 figures, 2 tables)

This paper contains 17 sections, 4 equations, 3 figures, 2 tables.

INTRODUCTION
BACKGROUND and RELATED WORK
MIMO Architecture
Model Compression for On-device Scenarios
METHODOLOGY
Overview of MIMONet
Reduce Intra-model Redundancy
Reduce Inter-model Redundancy
Integrate with Quantization Techniques
EVALUATION
Experimental Setup
Metrics
Effectiveness
Ablation Study
Case study on TurtleBot3
...and 2 more sections

Figures (3)

Figure 1: Overview of MIMONet. In the left part, gray circles represent neurons inducing intra-model redundancy. In the right part, green circles denote sharable neurons inducing inter-model redundancy. Best viewed in color.
Figure 2: Design for compression of the residual block for ResNet ResNet. The left side shows the structure of the residual block. The right side shows channel-level pruning and recovery. White and gray circles exhibit kept pruned channels. The pruned channels are filled with zeros before the feature map summing. Best viewed in color.
Figure 3: Data examples of RAVDESS dataset RAVDESS.

MIMONet: Multi-Input Multi-Output On-Device Deep Learning

TL;DR

Abstract

MIMONet: Multi-Input Multi-Output On-Device Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)