ISCS: Parameter-Guided Feature Pruning for Resource-Constrained Embodied Perception
Jinhao Wang, Nam Ling, Wei Wang, Wei Jiang
TL;DR
This work tackles the latency-accuracy challenge of perception in resource-constrained embodied agents by introducing Invariant Salient Channel Space (ISCS), a dataset-agnostic scaffold that discovers structure-critical (Salient-Core) channels and their correlated details (Salient-Auxiliary) from pretrained encoder weights. It then performs entropy-free static pruning, retaining SC channels and bypassing costly entropy models to achieve ultra-low latency through a deterministic split-computing protocol. Empirical results on COCO and UCF101 demonstrate near-original performance even at only 25% channel retention, with substantial reductions in both encoding time (to ~0.45s) and edge-side decoding latency (to ~3.4ms), validating a favorable rate-latency-accuracy trade-off. The approach offers a practical, robust pathway to real-time, human-aware embodied perception in edge-robot collaboration, with potential for adaptive extension under varying channel conditions or tasks.
Abstract
Prior studies in embodied AI consistently show that robust perception is critical for human-robot interaction, yet deploying high-fidelity visual models on resource-constrained agents remains challenging due to limited on-device computation power and transmission latency. Exploiting the redundancy in latent representations could improve system efficiency, yet existing approaches often rely on costly dataset-specific ablation tests or heavy entropy models unsuitable for real-time edge-robot collaboration. We propose a generalizable, dataset-agnostic method to identify and selectively transmit structure-critical channels in pretrained encoders. Instead of brute-force empirical evaluations, our approach leverages intrinsic parameter statistics-weight variances and biases-to estimate channel importance. This analysis reveals a consistent organizational structure, termed the Invariant Salient Channel Space (ISCS), where Salient-Core channels capture dominant structures while Salient-Auxiliary channels encode fine visual details. Building on ISCS, we introduce a deterministic static pruning strategy that enables lightweight split-computing. Experiments across different datasets demonstrate that our method achieves a deterministic, ultra-low latency pipeline by bypassing heavy entropy modeling. Our method reduces end-to-end latency, providing a critical speed-accuracy trade-off for resource-constrained human-aware embodied systems.
