Table of Contents
Fetching ...

Enabling Near-realtime Remote Sensing via Satellite-Ground Collaboration of Large Vision-Language Models

Zihan Li, Jiahao Yang, Yuxin Zhang, Zhe Chen, Yue Gao

TL;DR

This work addresses the challenge of delivering near-realtime LVLM-based inference for remote sensing under the constraints of LEO satellites with limited onboard compute and intermittent ground links. It introduces Grace, a satellite-ground collaborative framework that partitions inference between onboard compact LVLMs and ground-based larger LVLMs, connected by a dynamic, multimodal RAG knowledge archive and a confidence-driven task dispatcher. Key contributions include a dynamic satellite archive replacement/priority mechanism, a hierarchical transmission scheme that prioritizes recent queries, and a confidence-based cognitive test to decide offloading, all leading to a latency reduction of 76–95% with maintained accuracy. The approach enables scalable, bandwidth-aware, real-time RS processing and sets a path for future ISL-enabled enhancements and broader deployment of LVLM-based remote sensing analytics.

Abstract

Large vision-language models (LVLMs) have recently demonstrated great potential in remote sensing (RS) tasks (e.g., disaster monitoring) conducted by low Earth orbit (LEO) satellites. However, their deployment in real-world LEO satellite systems remains largely unexplored, hindered by limited onboard computing resources and brief satellite-ground contacts. We propose Grace, a satellite-ground collaborative system designed for near-realtime LVLM inference in RS tasks. Accordingly, we deploy compact LVLM on satellites for realtime inference, but larger ones on ground stations (GSs) to guarantee end-to-end performance. Grace is comprised of two main phases that are asynchronous satellite-GS Retrieval-Augmented Generation (RAG), and a task dispatch algorithm. Firstly, we still the knowledge archive of GS RAG to satellite archive with tailored adaptive update algorithm during limited satellite-ground data exchange period. Secondly, propose a confidence-based test algorithm that either processes the task onboard the satellite or offloads it to the GS. Extensive experiments based on real-world satellite orbital data show that Grace reduces the average latency by 76-95% compared to state-of-the-art methods, without compromising inference accuracy.

Enabling Near-realtime Remote Sensing via Satellite-Ground Collaboration of Large Vision-Language Models

TL;DR

This work addresses the challenge of delivering near-realtime LVLM-based inference for remote sensing under the constraints of LEO satellites with limited onboard compute and intermittent ground links. It introduces Grace, a satellite-ground collaborative framework that partitions inference between onboard compact LVLMs and ground-based larger LVLMs, connected by a dynamic, multimodal RAG knowledge archive and a confidence-driven task dispatcher. Key contributions include a dynamic satellite archive replacement/priority mechanism, a hierarchical transmission scheme that prioritizes recent queries, and a confidence-based cognitive test to decide offloading, all leading to a latency reduction of 76–95% with maintained accuracy. The approach enables scalable, bandwidth-aware, real-time RS processing and sets a path for future ISL-enabled enhancements and broader deployment of LVLM-based remote sensing analytics.

Abstract

Large vision-language models (LVLMs) have recently demonstrated great potential in remote sensing (RS) tasks (e.g., disaster monitoring) conducted by low Earth orbit (LEO) satellites. However, their deployment in real-world LEO satellite systems remains largely unexplored, hindered by limited onboard computing resources and brief satellite-ground contacts. We propose Grace, a satellite-ground collaborative system designed for near-realtime LVLM inference in RS tasks. Accordingly, we deploy compact LVLM on satellites for realtime inference, but larger ones on ground stations (GSs) to guarantee end-to-end performance. Grace is comprised of two main phases that are asynchronous satellite-GS Retrieval-Augmented Generation (RAG), and a task dispatch algorithm. Firstly, we still the knowledge archive of GS RAG to satellite archive with tailored adaptive update algorithm during limited satellite-ground data exchange period. Secondly, propose a confidence-based test algorithm that either processes the task onboard the satellite or offloads it to the GS. Extensive experiments based on real-world satellite orbital data show that Grace reduces the average latency by 76-95% compared to state-of-the-art methods, without compromising inference accuracy.

Paper Structure

This paper contains 44 sections, 6 equations, 11 figures, 1 algorithm.

Figures (11)

  • Figure 1: Though collaborative inference may facilitate an efficient LVLM inference on satellites, i) limited onboard resources and ii) brief satellite-GS contact still pose significant challenges to the deployment.
  • Figure 2: The the limited onboard resources (a) restrict the onboard generalization (b). The intermittent connection (c) and the limited transmission rate become a main bottleneck of task latency (d).
  • Figure 3: Grace Overview. On the above, the satellite is equipped with a compact LVLM and a lightweight archive to process queries using onboard resources. On the below, the ground station features a comprehensive archive and a powerful LVLM, processing buffered queries and updating the satellite archive. "Sec." indicates "Section".
  • Figure 4: Ground-station inference process. Relevant data for the query is retrieved by the ground archive. The ground LVLM takes both the query and the search result as input to generate the inference outcome.
  • Figure 5: The hierarchical transmission mechanism. The LEO satellite side (left) and the ground station side (right). The priority queue will be transmitted to the ground station first, and its retrieval results will be sent back for updating the satellite archive. The secondary queue will be transmitted after the completion of the priority queue transmission.
  • ...and 6 more figures