COSTREAM: Learned Cost Models for Operator Placement in Edge-Cloud Environments
Roman Heinrich, Carsten Binnig, Harald Kornmayer, Manisha Luthra
TL;DR
COSTREAM tackles the challenge of optimal initial operator placement for streaming queries in heterogeneous edge-cloud settings by learning a zero-shot cost model. It introduces a joint operator-resource graph and a GNN-based cost predictor that generalizes to unseen hardware and workloads, enabling effective offline placement without runtime statistics. The approach yields substantial placement speed-ups (up to ~21x) and demonstrates strong predictive accuracy across varied hardware, query structures, and benchmarks, supported by a new large-scale cost-estimation dataset. This work advances cost-based optimization for edge-cloud DSPS and lays groundwork for broader offline optimizations in heterogeneous environments.
Abstract
In this work, we present COSTREAM, a novel learned cost model for Distributed Stream Processing Systems that provides accurate predictions of the execution costs of a streaming query in an edge-cloud environment. The cost model can be used to find an initial placement of operators across heterogeneous hardware, which is particularly important in these environments. In our evaluation, we demonstrate that COSTREAM can produce highly accurate cost estimates for the initial operator placement and even generalize to unseen placements, queries, and hardware. When using COSTREAM to optimize the placements of streaming operators, a median speed-up of around 21x can be achieved compared to baselines.
