UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning

Kathakoli Sengupta; Zhongkai Shangguan; Sandesh Bharadwaj; Sanjay Arora; Eshed Ohn-Bar; Renato Mancuso

UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning

Kathakoli Sengupta, Zhongkai Shangguan, Sandesh Bharadwaj, Sanjay Arora, Eshed Ohn-Bar, Renato Mancuso

TL;DR

UniLCD tackles real-time vision-based mobile systems by learning a flexible local-cloud routing policy that balances energy, latency, and safety. The approach trains a local and a cloud navigation policy via imitation learning, then optimizes a residual routing policy with PPO under a multiplicative multi-objective reward, including a dedicated collision penalty. A shared feature extractor enables on-device efficiency, while embedding-based communication to the cloud reduces energy and delays; results on CARLA crowded navigation show substantial gains in ecological navigation performance (ENS up to ≈86%) and overall efficiency, outperforming state-of-the-art baselines by over 35%. This work offers a practical framework for sustainable, safe, real-time cloud-edge collaboration applicable to dynamic, safety-critical robotic systems.

Abstract

Embodied vision-based real-world systems, such as mobile robots, require a careful balance between energy consumption, compute latency, and safety constraints to optimize operation across dynamic tasks and contexts. As local computation tends to be restricted, offloading the computation, ie, to a remote server, can save local resources while providing access to high-quality predictions from powerful and large models. However, the resulting communication and latency overhead has led to limited usability of cloud models in dynamic, safety-critical, real-time settings. To effectively address this trade-off, we introduce UniLCD, a novel hybrid inference framework for enabling flexible local-cloud collaboration. By efficiently optimizing a flexible routing module via reinforcement learning and a suitable multi-task objective, UniLCD is specifically designed to support the multiple constraints of safety-critical end-to-end mobile systems. We validate the proposed approach using a challenging, crowded navigation task requiring frequent and timely switching between local and cloud operations. UniLCD demonstrates improved overall performance and efficiency, by over 35% compared to state-of-the-art baselines based on various split computing and early exit strategies.

UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning

TL;DR

Abstract

Paper Structure (14 sections, 12 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 14 sections, 12 equations, 4 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Method
Problem Formulation
Learning Local and Cloud Policies
Learning a Routing Policy
Experiments
Implementation Details
Evaluation Metrics
Results
Conclusion and Future Work
Environment Implementation
Reward Design
Performance Over Task Difficulty

Figures (4)

Figure 1: Overview of UniLCD. Our system comprises a situational routing module, which takes the current embedding and a history of previous actions. Local actions are predicted by a pre-trained lightweight model that can be deployed efficiently on a the mobile system. The sample-efficient routing module, trained via RL, determines whether to implement the local action or transmit the scene embedding to the cloud server model, which is more accurate but computationally expensive and induces latency.
Figure 2: Training Progress Results. We evaluate performance for different models, including UniLCD trained with a standard, carefully tuned, additive reward vs. UniLCD with the proposed reward function. We find consistently improved performance throughout the entire model training process. Results are shown for averaging across 10 evaluation seeds.
Figure 3: Training Progression. We evaluate the rewards for our overall framework throughout the training process against baseline models. We demonstrate high sample-efficiency compared to prior methods, particularly when the routing module is given a history input, i.e., prior actions and their source (cloud or local decision). Despite common instabilities in training reinforcement learning models, UniLCD is shown to achieve significantly higher reward early in the training process.
Figure 4: Examples of Different Environmental Settings in Our Robot Navigation Environment. We vary the pedestrian density in order to stress-test the proposed UniLCD method. Pedestrian count along the path ranges from 5 (Low), 15 (Medium), 30 (High), and 70 (Crowd).

UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning

TL;DR

Abstract

UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)