Perceive What Matters: Relevance-Driven Scheduling for Multimodal Streaming Perception

Dingcheng Huang; Xiaotong Zhang; Kamal Youcef-Toumi

Perceive What Matters: Relevance-Driven Scheduling for Multimodal Streaming Perception

Dingcheng Huang, Xiaotong Zhang, Kamal Youcef-Toumi

Abstract

In modern human-robot collaboration (HRC) applications, multiple perception modules jointly extract visual, auditory, and contextual cues to achieve comprehensive scene understanding, enabling the robot to provide appropriate assistance to human agents intelligently. While executing multiple perception modules on a frame-by-frame basis enhances perception quality in offline settings, it inevitably accumulates latency, leading to a substantial decline in system performance in streaming perception scenarios. Recent work in scene understanding, termed Relevance, has established a solid foundation for developing efficient methodologies in HRC. However, modern perception pipelines still face challenges related to information redundancy and suboptimal allocation of computational resources. Drawing inspiration from the Relevance concept and the information sparsity in HRC events, we propose a novel lightweight perception scheduling framework that efficiently leverages output from previous frames to estimate and schedule necessary perception modules in real-time based on scene context. The experimental results demonstrate that the proposed perception scheduling framework effectively reduces computational latency by up to 27.52% compared to conventional parallel perception pipelines, while also achieving a 72.73% improvement in MMPose activation recall. Additionally, the framework demonstrates high keyframe accuracy, achieving rates of up to 98%. The results validate the framework's capability to enhance real-time perception efficiency without significantly compromising accuracy. The framework shows potential as a scalable and systematic solution for multimodal streaming perception systems in HRC.

Perceive What Matters: Relevance-Driven Scheduling for Multimodal Streaming Perception

Abstract

Paper Structure (21 sections, 18 equations, 2 figures, 2 tables)

This paper contains 21 sections, 18 equations, 2 figures, 2 tables.

Introduction
Related Works
Keyframe Detection
Adaptive and Computation-Aware Efficient Perception
Adaptive Sampling and Sensor Scheduling
Perception Scheduling Framework
Perception Region Segmentation
Motion Status Estimation
Relevance State Update
Perception Reward Estimation
Perception Module Selector
Reward Modeling
Reward Model for Object Detection
Reward Model for Human Full-body Pose Estimation
Experimental Setup
...and 6 more sections

Figures (2)

Figure 1: Perception scheduling framework for context-aware and efficient perception in human-robot collaboration. The system uses information from the previous frame to segment relevant regions, estimate motion status, and update relevance states. Each module in the perception toolkit is evaluated based on a reward that balances expected information gain and computational cost. The module selector computes the optimal activation set at each frame, and the selected outputs are used to update the relevance framework.
Figure 2: Demonstration of perception scheduling. The Perception scheduling framework effectively adapts module activation to scene context

Perceive What Matters: Relevance-Driven Scheduling for Multimodal Streaming Perception

Abstract

Perceive What Matters: Relevance-Driven Scheduling for Multimodal Streaming Perception

Authors

Abstract

Table of Contents

Figures (2)