Table of Contents
Fetching ...

TSLA: A Task-Specific Learning Adaptation for Semantic Segmentation on Autonomous Vehicles Platform

Jun Liu, Zhenglun Kong, Pu Zhao, Weihao Zeng, Hao Tang, Xuan Shen, Changdi Yang, Wenbin Zhang, Geng Yuan, Wei Niu, Xue Lin, Yanzhi Wang

TL;DR

The paper tackles the problem of deploying semantic segmentation for autonomous driving on hardware with diverse constraints by introducing Task-Specific Learning Adaptation (TSLA). It proposes a MobileNetV4-based framework with a three-tier control—width multiplier, classifier depth, and classifier kernel—to dynamically balance accuracy and computational load, and it employs Bayesian Optimization with a Gaussian Process surrogate to search hyperparameters under budgetary limits. Key contributions include first adapting MobileNetV4 for edge-oriented segmentation with TSLA, the three-tier adaptability for broad scale and targeted refinement, and a budget-aware BO NAS mechanism to maximize performance within GOPS constraints. The approach demonstrates practical impact by enabling real-time, high-IOU segmentation on embedded platforms (e.g., DRIVE PX 2) across multiple driving scenarios, while maintaining low $GFLOPs$/$GOPs$ demands and scalable parameter search overhead.”

Abstract

Autonomous driving platforms encounter diverse driving scenarios, each with varying hardware resources and precision requirements. Given the computational limitations of embedded devices, it is crucial to consider computing costs when deploying on target platforms like the NVIDIA\textsuperscript{\textregistered} DRIVE PX 2. Our objective is to customize the semantic segmentation network according to the computing power and specific scenarios of autonomous driving hardware. We implement dynamic adaptability through a three-tier control mechanism -- width multiplier, classifier depth, and classifier kernel -- allowing fine-grained control over model components based on hardware constraints and task requirements. This adaptability facilitates broad model scaling, targeted refinement of the final layers, and scenario-specific optimization of kernel sizes, leading to improved resource allocation and performance. Additionally, we leverage Bayesian Optimization with surrogate modeling to efficiently explore hyperparameter spaces under tight computational budgets. Our approach addresses scenario-specific and task-specific requirements through automatic parameter search, accommodating the unique computational complexity and accuracy needs of autonomous driving. It scales its Multiply-Accumulate Operations (MACs) for Task-Specific Learning Adaptation (TSLA), resulting in alternative configurations tailored to diverse self-driving tasks. These TSLA customizations maximize computational capacity and model accuracy, optimizing hardware utilization.

TSLA: A Task-Specific Learning Adaptation for Semantic Segmentation on Autonomous Vehicles Platform

TL;DR

The paper tackles the problem of deploying semantic segmentation for autonomous driving on hardware with diverse constraints by introducing Task-Specific Learning Adaptation (TSLA). It proposes a MobileNetV4-based framework with a three-tier control—width multiplier, classifier depth, and classifier kernel—to dynamically balance accuracy and computational load, and it employs Bayesian Optimization with a Gaussian Process surrogate to search hyperparameters under budgetary limits. Key contributions include first adapting MobileNetV4 for edge-oriented segmentation with TSLA, the three-tier adaptability for broad scale and targeted refinement, and a budget-aware BO NAS mechanism to maximize performance within GOPS constraints. The approach demonstrates practical impact by enabling real-time, high-IOU segmentation on embedded platforms (e.g., DRIVE PX 2) across multiple driving scenarios, while maintaining low / demands and scalable parameter search overhead.”

Abstract

Autonomous driving platforms encounter diverse driving scenarios, each with varying hardware resources and precision requirements. Given the computational limitations of embedded devices, it is crucial to consider computing costs when deploying on target platforms like the NVIDIA\textsuperscript{\textregistered} DRIVE PX 2. Our objective is to customize the semantic segmentation network according to the computing power and specific scenarios of autonomous driving hardware. We implement dynamic adaptability through a three-tier control mechanism -- width multiplier, classifier depth, and classifier kernel -- allowing fine-grained control over model components based on hardware constraints and task requirements. This adaptability facilitates broad model scaling, targeted refinement of the final layers, and scenario-specific optimization of kernel sizes, leading to improved resource allocation and performance. Additionally, we leverage Bayesian Optimization with surrogate modeling to efficiently explore hyperparameter spaces under tight computational budgets. Our approach addresses scenario-specific and task-specific requirements through automatic parameter search, accommodating the unique computational complexity and accuracy needs of autonomous driving. It scales its Multiply-Accumulate Operations (MACs) for Task-Specific Learning Adaptation (TSLA), resulting in alternative configurations tailored to diverse self-driving tasks. These TSLA customizations maximize computational capacity and model accuracy, optimizing hardware utilization.

Paper Structure

This paper contains 26 sections, 10 equations, 6 figures, 5 tables, 3 algorithms.

Figures (6)

  • Figure 1: Comparison of FLOPs and mIoU on cityscapes test set with real-time methods. The bigger the bubble, the larger the computational complexity of the model. The horizontal axis is mIoU, and the vertical axis is GFLOPS.
  • Figure 2: The architecture of the Task-Specific Learning Adaptation (TLSA) network consists of two main blocks. The left block serves as the Feature Extractor, with a kernel size denoted as $k$ and a width multiplier DBLP:journals/corr/HowardZCKWWAA17 denoted as $d\_mul$. The right block is responsible for Semantic Segmentation, with the number of classes denoted as $n$. In addition, the classifier depth is represented as $d$.
  • Figure 3: Depth-wise convolutions.
  • Figure 4: 1x1 convolutions as pixel-wise classifiers.
  • Figure 5: Use a bilinear distribution as the weights initializer for the Transposed Convolution kernels in our architecture. The Left shows a single bilinear convolution kernel (64x64). This kernel is used for upsampling operations. Right thumbnails showing all generated bilinear convolution kernels.
  • ...and 1 more figures