Table of Contents
Fetching ...

Lotus: learning-based online thermal and latency variation management for two-stage detectors on edge devices

Yifan Gong, Yushu Wu, Zheng Zhan, Pu Zhao, Liangkai Liu, Chao Wu, Xulong Tang, Yanzhi Wang

TL;DR

Lotus, a novel framework that is tailored for two-stage detectors to dynamically scale CPU and GPU frequencies jointly in an online manner based on deep reinforcement learning (DRL), is proposed and implemented.

Abstract

Two-stage object detectors exhibit high accuracy and precise localization, especially for identifying small objects that are favorable for various edge applications. However, the high computation costs associated with two-stage detection methods cause more severe thermal issues on edge devices, incurring dynamic runtime frequency change and thus large inference latency variations. Furthermore, the dynamic number of proposals in different frames leads to various computations over time, resulting in further latency variations. The significant latency variations of detectors on edge devices can harm user experience and waste hardware resources. To avoid thermal throttling and provide stable inference speed, we propose Lotus, a novel framework that is tailored for two-stage detectors to dynamically scale CPU and GPU frequencies jointly in an online manner based on deep reinforcement learning (DRL). To demonstrate the effectiveness of Lotus, we implement it on NVIDIA Jetson Orin Nano and Mi 11 Lite mobile platforms. The results indicate that Lotus can consistently and significantly reduce latency variation, achieve faster inference, and maintain lower CPU and GPU temperatures under various settings.

Lotus: learning-based online thermal and latency variation management for two-stage detectors on edge devices

TL;DR

Lotus, a novel framework that is tailored for two-stage detectors to dynamically scale CPU and GPU frequencies jointly in an online manner based on deep reinforcement learning (DRL), is proposed and implemented.

Abstract

Two-stage object detectors exhibit high accuracy and precise localization, especially for identifying small objects that are favorable for various edge applications. However, the high computation costs associated with two-stage detection methods cause more severe thermal issues on edge devices, incurring dynamic runtime frequency change and thus large inference latency variations. Furthermore, the dynamic number of proposals in different frames leads to various computations over time, resulting in further latency variations. The significant latency variations of detectors on edge devices can harm user experience and waste hardware resources. To avoid thermal throttling and provide stable inference speed, we propose Lotus, a novel framework that is tailored for two-stage detectors to dynamically scale CPU and GPU frequencies jointly in an online manner based on deep reinforcement learning (DRL). To demonstrate the effectiveness of Lotus, we implement it on NVIDIA Jetson Orin Nano and Mi 11 Lite mobile platforms. The results indicate that Lotus can consistently and significantly reduce latency variation, achieve faster inference, and maintain lower CPU and GPU temperatures under various settings.

Paper Structure

This paper contains 27 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The mean and variation of inference latency and precision for two-stage detectors (FasterRCNN, MaskRCNN) and one-stage detector (YOLOv5) on different datasets.
  • Figure 2: Inference latency of the second stage for different numbers of proposals on FasterRCNN and MaskRCNN.
  • Figure 3: Overview of Lotus.
  • Figure 4: Comparison on Jetson Orin Nano with FasterRCNN. Red dashed lines indicate the throttling bound and latency constraint.
  • Figure 5: Evaluation on Jetson Orin Nano with MaskRCNN. Red dashed lines indicate the throttling bound and latency constraint.
  • ...and 2 more figures