Energy Efficiency in Cloud-Based Big Data Processing for Earth Observation: Gap Analysis and Future Directions
Adhitya Bhawiyuga, Serkan Girgin, Rolf A. de By, Raul Zurita-Milla
TL;DR
The paper addresses the energy consumption and carbon footprint of cloud-based Earth Observation Big Data (EOBD) processing, a topic historically underexplored in EO. It conducts a gap analysis across EO data characteristics, cloud service landscapes, and existing energy-efficiency approaches, organizing findings by infrastructure, storage, processing, and applications. Key contributions include identifying gaps in energy monitoring, benchmarking, cloud orchestration, and task scheduling, and proposing three future directions: an EOBD-specific energy benchmarking/monitoring toolkit, energy-aware infrastructure orchestration, and multi-objective energy-aware scheduling (e.g., optimizing for energy per task alongside makespan). By outlining these directions and highlighting the lack of transparent energy metrics from providers, the paper lays a foundation for reducing the power consumption and environmental impact of EOBD workflows while preserving analytical performance, enabling more sustainable cloud EO research and operations. The discussion emphasizes the need for standardized benchmarks and energy metrics (e.g., $GOPS/s/W$) to enable reproducible comparisons across cloud backends and hardware.
Abstract
Earth observation (EO) data volumes are rapidly increasing. While cloud computing are now used for processing large EO datasets, the energy efficiency aspects of such a processing have received much less attention. This issue is notable given the increasing awareness of energy costs and carbon footprint in big data processing, particularly with increased attention on compute-intensive foundation models. In this paper we identify gaps in energy efficiency practices within cloud-based EO big data (EOBD) processing and propose several research directions for improvement. We first examine the current EOBD landscape, focus on the requirements that necessitate cloud-based processing and analyze existing cloud-based EOBD solutions. We then investigate energy efficiency strategies that have been successfully employed in well-studied big data domains. Through this analysis, we identify several critical gaps in existing EOBD processing platforms, which primarily focus on data accessibility and computational feasibility, instead of energy efficiency. These gaps include insufficient energy monitoring mechanisms, lack of energy awareness in data management, inadequate implementation of energy-aware resource allocation and lack of energy efficiency criteria on task scheduling. Based on these findings, we propose the development of energy-aware performance monitoring and benchmarking frameworks, the use of optimization techniques for infrastructure orchestration, and of energy-efficient task scheduling approaches for distributed cloud-based EOBD processing frameworks. These proposed approaches aim to foster more energy awareness in EOBD processing , potentially reducing power consumption and environmental impact while maintaining or minimally impacting processing performance.
