Table of Contents
Fetching ...

EffiPerception: an Efficient Framework for Various Perception Tasks

Xinhao Xiang, Simon Dräger, Jiawei Zhang

TL;DR

EffiPerception addresses the need for a unified, resource-efficient framework capable of handling multiple perception tasks across 2D and 3D modalities. It introduces three core pillars—Efficient Feature Extractors, Efficient Layers (including Sparse Down-Sampling and Global Spatial Aggregation), and EffiOptim (an 8-bit optimizer)—to balance accuracy, speed, and memory while remaining compatible with existing backbones and heads. Extensive experiments on KITTI, Semantic KITTI, and COCO demonstrate significant gains in accuracy and efficiency, with ablations confirming the contributions of SDS and GSA and robustness to common corruptions. The work provides a practical, scalable path toward deployment of multi-task perception systems on resource-constrained devices.

Abstract

The accuracy-speed-memory trade-off is always the priority to consider for several computer vision perception tasks. Previous methods mainly focus on a single or small couple of these tasks, such as creating effective data augmentation, feature extractor, learning strategies, etc. These approaches, however, could be inherently task-specific: their proposed model's performance may depend on a specific perception task or a dataset. Targeting to explore common learning patterns and increasing the module robustness, we propose the EffiPerception framework. It could achieve great accuracy-speed performance with relatively low memory cost under several perception tasks: 2D Object Detection, 3D Object Detection, 2D Instance Segmentation, and 3D Point Cloud Segmentation. Overall, the framework consists of three parts: (1) Efficient Feature Extractors, which extract the input features for each modality. (2) Efficient Layers, plug-in plug-out layers that further process the feature representation, aggregating core learned information while pruning noisy proposals. (3) The EffiOptim, an 8-bit optimizer to further cut down the computational cost and facilitate performance stability. Extensive experiments on the KITTI, semantic-KITTI, and COCO datasets revealed that EffiPerception could show great accuracy-speed-memory overall performance increase within the four detection and segmentation tasks, in comparison to earlier, well-respected methods.

EffiPerception: an Efficient Framework for Various Perception Tasks

TL;DR

EffiPerception addresses the need for a unified, resource-efficient framework capable of handling multiple perception tasks across 2D and 3D modalities. It introduces three core pillars—Efficient Feature Extractors, Efficient Layers (including Sparse Down-Sampling and Global Spatial Aggregation), and EffiOptim (an 8-bit optimizer)—to balance accuracy, speed, and memory while remaining compatible with existing backbones and heads. Extensive experiments on KITTI, Semantic KITTI, and COCO demonstrate significant gains in accuracy and efficiency, with ablations confirming the contributions of SDS and GSA and robustness to common corruptions. The work provides a practical, scalable path toward deployment of multi-task perception systems on resource-constrained devices.

Abstract

The accuracy-speed-memory trade-off is always the priority to consider for several computer vision perception tasks. Previous methods mainly focus on a single or small couple of these tasks, such as creating effective data augmentation, feature extractor, learning strategies, etc. These approaches, however, could be inherently task-specific: their proposed model's performance may depend on a specific perception task or a dataset. Targeting to explore common learning patterns and increasing the module robustness, we propose the EffiPerception framework. It could achieve great accuracy-speed performance with relatively low memory cost under several perception tasks: 2D Object Detection, 3D Object Detection, 2D Instance Segmentation, and 3D Point Cloud Segmentation. Overall, the framework consists of three parts: (1) Efficient Feature Extractors, which extract the input features for each modality. (2) Efficient Layers, plug-in plug-out layers that further process the feature representation, aggregating core learned information while pruning noisy proposals. (3) The EffiOptim, an 8-bit optimizer to further cut down the computational cost and facilitate performance stability. Extensive experiments on the KITTI, semantic-KITTI, and COCO datasets revealed that EffiPerception could show great accuracy-speed-memory overall performance increase within the four detection and segmentation tasks, in comparison to earlier, well-respected methods.
Paper Structure (23 sections, 2 equations, 4 figures, 7 tables)

This paper contains 23 sections, 2 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Overview of EffiPerception, an efficient framework for various perception tasks
  • Figure 2: The Efficient Feature Extraction framework
  • Figure 3: The Efficient Layers
  • Figure 4: Experiment Results