Budgeted Spatial Data Acquisition: When Coverage and Connectivity Matter
Wenzhe Yang, Shixun Huang, Sheng Wang, Zhiyong Peng
TL;DR
The paper addresses acquiring a collection of spatial datasets under a budget while maximizing spatial coverage and ensuring connectivity, formalizing this as Budgeted Maximum Coverage with Connectivity Constraint (BMCC). It proves BMCC is NP-hard via a reduction from MCP and presents two greedy approximation algorithms, DSA and DPSA, with theoretical guarantees, plus two acceleration strategies to improve practicality. Empirical evaluation on five real-world spatial collections shows DPSA (especially with BFS acceleration) achieves strong coverage with substantial speedups, validating the approach for data marketplaces seeking connected dataset collections. The work advances spatial data acquisition by integrating coverage and connectivity into collection-level decisions under budget, with implications for transit planning, urban analytics, and geospatial marketplaces.
Abstract
Data is undoubtedly becoming a commodity like oil, land, and labor in the 21st century. Although there have been many successful marketplaces for data trading, the existing data marketplaces lack consideration of the case where buyers want to acquire a collection of datasets (instead of one), and the overall spatial coverage and connectivity matter. In this paper, we take the first attempt to formulate this problem as Budgeted Maximum Coverage with Connectivity Constraint (BMCC), which aims to acquire a dataset collection with the maximum spatial coverage under a limited budget while maintaining spatial connectivity. To solve the problem, we propose two approximate algorithms with detailed theoretical guarantees and time complexity analysis, followed by two acceleration strategies to further improve the efficiency of the algorithm. Experiments are conducted on five real-world spatial dataset collections to verify the efficiency and effectiveness of our algorithms.
