RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA models

Zihao Zheng; Sicheng Tian; Hangyu Cao; Chenyue Li; Jiayu Chen; Maoliang Li; Xinhao Sun; Hailong Zou; Guojie Luo; Xiang Chen

RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA models

Zihao Zheng, Sicheng Tian, Hangyu Cao, Chenyue Li, Jiayu Chen, Maoliang Li, Xinhao Sun, Hailong Zou, Guojie Luo, Xiang Chen

TL;DR

A novel ECC inference framework, termed RAPID, is proposed that achieves a speedup of up to 1.73x with only 5%~7% overhead and an implementation tailored to the proposed framework is developed.

Abstract

Vision Language Action (VLA) models are mainstream in embodied intelligence but face high inference costs. Edge-Cloud Collaborative (ECC) inference offers an effective fix by easing edge-device computing pressure to meet real-time needs. However, existing ECC frameworks are suboptimal for VLA models due to two challenges: (1) Mainstream environment-oriented edge-cloud partitioning methods are susceptible to interference from visual noise; (2) Existing edge-cloud partitioning methods overlook the step-wise redundancy unique to embodied tasks, thereby disrupting the physical continuity of motion. To address these issues, we propose a novel ECC inference framework, termed RAPID. Specifically, we developed an implementation tailored to the proposed framework. Experiments demonstrate this achieves a speedup of up to 1.73x with only 5%~7% overhead.

RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA models

TL;DR

A novel ECC inference framework, termed RAPID, is proposed that achieves a speedup of up to 1.73x with only 5%~7% overhead and an implementation tailored to the proposed framework is developed.

Abstract

Paper Structure (39 sections, 8 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 39 sections, 8 equations, 5 figures, 5 tables, 1 algorithm.

Introduction
Background
Vision-Language-Action Models and Action Chunking
Edge-Cloud Co-Inference
Computing Offloading strategy
Limitations of vision-based Partitioning
Observation and Motivation
Compatibility Analysis of Partitioning Schemes
Compatibility Limitations of Environment-Oriented strategies Across Multi-Environments
Correlations between Vision-Based Confidence and Kinematics
Redundancy Analysis During Multi-Step Generation
Step-Wise Redundancy Identification in VLA Inference
Correlation Between Redundancy and Kinematics
RAPID Framework
Compatibility-Optimal Partitioning Mechanism
...and 24 more sections

Figures (5)

Figure 1: Comparison between Vision-Based Strategy(Left) and Our RAPID Framework(Middle).
Figure 2: (a) Vision-based Offloading Strategy in Different Degree of Noise. (b) Kinematic Offloading Strategy Performance.
Figure 3: Correlation Analysis between Joint Torque and Step-Wise Redundancy
Figure 4: The RAPID Algorithmic Framework
Figure 5: A Case of RAPID Framework Completing Real-World Tasks (Task Name: Pick up the Banana and Put it into the Blue Bowl)

RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA models

TL;DR

Abstract

RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)