COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints

Mohammad Saeid Anwar; Anuradha Ravi; Indrajeet Ghosh; Gaurav Shinde; Carl Busart; Nirmalya Roy

COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints

Mohammad Saeid Anwar, Anuradha Ravi, Indrajeet Ghosh, Gaurav Shinde, Carl Busart, Nirmalya Roy

TL;DR

COHORT is presented, a collaborative DNN inference and task-execution framework for multi-robot systems built on the Robotic Operating System (ROS) and employs a hybrid offline-online reinforcement learning (RL) strategy to dynamically schedule and distribute DNN module execution across robots.

Abstract

Large deep neural networks (DNNs), especially transformer-based and multimodal architectures, are computationally demanding and challenging to deploy on resource-constrained edge platforms like field robots. These challenges intensify in mission-critical scenarios (e.g., disaster response), where robots must collaborate under tight constraints on bandwidth, latency, and battery life, often without infrastructure or server support. To address these limitations, we present COHORT, a collaborative DNN inference and task-execution framework for multi-robot systems built on the Robotic Operating System (ROS). COHORT employs a hybrid offline-online reinforcement learning (RL) strategy to dynamically schedule and distribute DNN module execution across robots. Our key contributions are threefold: (a) Offline RL policy learning combined with Advantage-Weighted Regression (AWR), trained on auction-based task allocation data from heterogeneous DNN workloads across distributed robots, (b) Online policy adaptation via Multi-Agent PPO (MAPPO), initialized from the offline policy and fine-tuned in real time, and (c) comprehensive evaluation of COHORT on vision-language model (VLM) inference tasks such as CLIP and SAM, analyzing scalability with increasing robot/workload and robustness under . We benchmark COHORT against genetic algorithms and multiple RL baselines. Experimental results demonstrate that COHORT reduces battery consumption by 15.4% and increases GPU utilization by 51.67%, while satisfying frame-rate and deadline constraints 2.55 times of the time.

COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints

TL;DR

Abstract

Paper Structure (32 sections, 18 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 32 sections, 18 equations, 11 figures, 2 tables, 1 algorithm.

Introduction
Background and Motivation
Task Execution across Edge Devices
Distributed and Collaborative Inference
Reinforcement Learning for Resource Management
COHORT Framework
System Architecture and Input
Offline Training Pipeline
Online Policy Execution
Problem formulation
Data Collection: Auction-Based Cooperative Decision Making
Observations.
Actions and auction mechanism.
Dynamics.
Action bounds (implementation).
...and 17 more sections

Figures (11)

Figure 1: COHORT System Architecture
Figure 2: COHORT Offline RL Training
Figure 3: COHORT RL online Training
Figure 4: Actor and critic loss convergence across training updates for Husky, Jackal, and Spot.
Figure 5: Mean reward progression across training updates for Spot, Jackal, and Husky under online reinforcement learning.
...and 6 more figures

COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints

TL;DR

Abstract

COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints

Authors

TL;DR

Abstract

Table of Contents

Figures (11)