Table of Contents
Fetching ...

Color When It Counts: Grayscale-Guided Online Triggering for Always-On Streaming Video Sensing

Weitong Cai, Hang Zhang, Yukai Huang, Shitong Sun, Jiankang Deng, Songcen Xu, Jifei Song, Zhensong Zhang

Abstract

Always-on sensing is essential for next-generation edge/wearable AI systems, yet continuous high-fidelity RGB video capture remains prohibitively expensive for resource-constrained mobile and edge platforms. We present a new paradigm for efficient streaming video understanding: grayscale-always, color-on-demand. Through preliminary studies, we discover that color is not always necessary. Sparse RGB frames suffice for comparable performance when temporal structure is preserved via continuous grayscale streams. Building on this insight, we propose ColorTrigger, an online training-free trigger that selectively activates color capture based on windowed grayscale affinity analysis. Designed for real-time edge deployment, ColorTrigger uses lightweight quadratic programming to detect chromatic redundancy causally, coupled with credit-budgeted control and dynamic token routing to jointly reduce sensing and inference costs. On streaming video understanding benchmarks, ColorTrigger achieves 91.6% of full-color baseline performance while using only 8.1% RGB frames, demonstrating substantial color redundancy in natural videos and enabling practical always-on video sensing on resource-constrained devices.

Color When It Counts: Grayscale-Guided Online Triggering for Always-On Streaming Video Sensing

Abstract

Always-on sensing is essential for next-generation edge/wearable AI systems, yet continuous high-fidelity RGB video capture remains prohibitively expensive for resource-constrained mobile and edge platforms. We present a new paradigm for efficient streaming video understanding: grayscale-always, color-on-demand. Through preliminary studies, we discover that color is not always necessary. Sparse RGB frames suffice for comparable performance when temporal structure is preserved via continuous grayscale streams. Building on this insight, we propose ColorTrigger, an online training-free trigger that selectively activates color capture based on windowed grayscale affinity analysis. Designed for real-time edge deployment, ColorTrigger uses lightweight quadratic programming to detect chromatic redundancy causally, coupled with credit-budgeted control and dynamic token routing to jointly reduce sensing and inference costs. On streaming video understanding benchmarks, ColorTrigger achieves 91.6% of full-color baseline performance while using only 8.1% RGB frames, demonstrating substantial color redundancy in natural videos and enabling practical always-on video sensing on resource-constrained devices.
Paper Structure (17 sections, 7 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 17 sections, 7 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Towards always-on sensing. (a) Realizing always-on sensing is difficult in practice: the Hi-res RGB video pipeline (such as sensor, ISP, encoding, and wireless transmission) quickly exhausts power on the edge AI system, so devices such as smart glasses typically sustain only about 30–60 minutes of continuous recording meta2025rayban, far from the all-day operation needed for a standby assistant. (b) We propose ColorTrigger, a grayscale-always, color-on-demand paradigm that uses a low-power gray camera as an always-on monitor and sparsely triggers an RGB camera only when needed, enabling always-on video sensing on edge devices.
  • Figure 2: Preliminary Studies: Color is not always necessary. (a) We uniformly insert RGB frames of the same resolution and quality into an otherwise grayscale video stream and evaluate on StreamingBench (All tasks) using Qwen2.5-VL-7B bai2025qwen25vl. Results show that only a small fraction of RGB frames is sufficient to achieve comparable performance. Furthermore, (b) We extract CLS features from 30 consecutive frames of a 1 fps video using CLIP ViT-B/16 radford2021clip, revealing redundancy in adjacent frames.
  • Figure 3: Overview of ColorTrigger. Our framework operates in two stages: (i) a causal online trigger analyzes grayscale features within a sliding window by aggregating them into an affinity matrix that captures temporal redundancy and change and applying a lightweight quadratic program under a credit-based budget; (ii) a dynamic token router adaptively allocates decoder capacity based on the trigger decision, using high-compression tokens for grayscale frames and high-capacity tokens for RGB frames, while preserving temporal order.
  • Figure 4: Performance across varying RGB frame ratios. We vary the target rate $r \in [0.05, 1.0]$ and evaluate on StreamingBench.
  • Figure 5: Qualitative Example of the proposed method.
  • ...and 1 more figures