A Multi-task Supervised Compression Model for Split Computing

Yoshitomo Matsubara; Matteo Mendula; Marco Levorato

A Multi-task Supervised Compression Model for Split Computing

Yoshitomo Matsubara, Matteo Mendula, Marco Levorato

TL;DR

Problem: enabling multi-task vision inference on resource-constrained edge devices via split computing. Method: Ladon, a single lightweight encoder with a shared backbone and unified preprocessing, serves image classification, object detection, and semantic segmentation in one inference, with edge servers handling the remaining computation. Contributions: first end-to-end multi-task supervised compression model for split computing; competitive accuracy on ILSVRC 2012, COCO 2017, and PASCAL VOC 2012; large reductions in end-to-end latency (up to 95.4%) and mobile energy (up to 88.2%), plus small encoder size (~0.5 MB). Significance: demonstrates practical edge deployments with real-device evaluation, offering substantial efficiency improvements for multi-task split computing.

Abstract

Split computing ($\neq$ split learning) is a promising approach to deep learning models for resource-constrained edge computing systems, where weak sensor (mobile) devices are wirelessly connected to stronger edge servers through channels with limited communication capacity. State-of-theart work on split computing presents methods for single tasks such as image classification, object detection, or semantic segmentation. The application of existing methods to multitask problems degrades model accuracy and/or significantly increase runtime latency. In this study, we propose Ladon, the first multi-task-head supervised compression model for multi-task split computing. Experimental results show that the multi-task supervised compression model either outperformed or rivaled strong lightweight baseline models in terms of predictive performance for ILSVRC 2012, COCO 2017, and PASCAL VOC 2012 datasets while learning compressed representations at its early layers. Furthermore, our models reduced end-to-end latency (by up to 95.4%) and energy consumption of mobile devices (by up to 88.2%) in multi-task split computing scenarios.

A Multi-task Supervised Compression Model for Split Computing

TL;DR

Abstract

Split computing (

split learning) is a promising approach to deep learning models for resource-constrained edge computing systems, where weak sensor (mobile) devices are wirelessly connected to stronger edge servers through channels with limited communication capacity. State-of-theart work on split computing presents methods for single tasks such as image classification, object detection, or semantic segmentation. The application of existing methods to multitask problems degrades model accuracy and/or significantly increase runtime latency. In this study, we propose Ladon, the first multi-task-head supervised compression model for multi-task split computing. Experimental results show that the multi-task supervised compression model either outperformed or rivaled strong lightweight baseline models in terms of predictive performance for ILSVRC 2012, COCO 2017, and PASCAL VOC 2012 datasets while learning compressed representations at its early layers. Furthermore, our models reduced end-to-end latency (by up to 95.4%) and energy consumption of mobile devices (by up to 88.2%) in multi-task split computing scenarios.

Paper Structure (26 sections, 4 equations, 6 figures, 4 tables)

This paper contains 26 sections, 4 equations, 6 figures, 4 tables.

Introduction
Preliminaries
Related Work
Ladon - Proposed Approach -
Problem Formulation
Shared encoder vs. end-to-end multi-task model
Unified preprocessing
Model Implementations
Training
Step 1: Pre-training encoder-decoder
Step 2: Fine-tuning decoder and subsequent modules
Step 3: Fine-tuning other task-specific modules
Experiments
Experimental Settings
Baselines
...and 11 more sections

Figures (6)

Figure 1: Entropic student (top) vs. our Ladon (bottom) in multi-task scenario. Gray module (encoder) is trained in a task-agnostic way. Red, green, and blue modules are trained for image classification, semantic segmentation, and object detection tasks respectively. Entropic student's encoder serves multiple downstream tasks, but all its modules except the encoder are trained independently, using different image preprocessing pipelines such as resizing and cropping. Our Ladon model shares a single image preprocessing pipeline and most of its parameters (except those of task-specific heads) across the downstream tasks at run time. While Entropic Student is designed to run three separate inferences per image, the Ladon model runs a single inference to serve the three tasks.
Figure 2: ILSVRC 2012: tradeoff between compressed data size and model accuracy
Figure 5: End-to-end latency for Jetson Nano (mobile device) and laptop with CUDA (edge server). Top/bottom: local computing without/with CUDA.
Figure 7: Energy consumption of Jetson Nano (mobile device). Top/bottom: local computing without/with CUDA.
Figure S9: End-to-end latency for Jetson Nano (mobile device), laptop with CUDA (edge server), and wireless communication data rate of 37.5 Kbps. Top/bottom: local computing without/with CUDA.
...and 1 more figures

A Multi-task Supervised Compression Model for Split Computing

TL;DR

Abstract

A Multi-task Supervised Compression Model for Split Computing

Authors

TL;DR

Abstract

Table of Contents

Figures (6)