Table of Contents
Fetching ...

Budgeted Spatial Data Acquisition: When Coverage and Connectivity Matter

Wenzhe Yang, Shixun Huang, Sheng Wang, Zhiyong Peng

TL;DR

The paper addresses acquiring a collection of spatial datasets under a budget while maximizing spatial coverage and ensuring connectivity, formalizing this as Budgeted Maximum Coverage with Connectivity Constraint (BMCC). It proves BMCC is NP-hard via a reduction from MCP and presents two greedy approximation algorithms, DSA and DPSA, with theoretical guarantees, plus two acceleration strategies to improve practicality. Empirical evaluation on five real-world spatial collections shows DPSA (especially with BFS acceleration) achieves strong coverage with substantial speedups, validating the approach for data marketplaces seeking connected dataset collections. The work advances spatial data acquisition by integrating coverage and connectivity into collection-level decisions under budget, with implications for transit planning, urban analytics, and geospatial marketplaces.

Abstract

Data is undoubtedly becoming a commodity like oil, land, and labor in the 21st century. Although there have been many successful marketplaces for data trading, the existing data marketplaces lack consideration of the case where buyers want to acquire a collection of datasets (instead of one), and the overall spatial coverage and connectivity matter. In this paper, we take the first attempt to formulate this problem as Budgeted Maximum Coverage with Connectivity Constraint (BMCC), which aims to acquire a dataset collection with the maximum spatial coverage under a limited budget while maintaining spatial connectivity. To solve the problem, we propose two approximate algorithms with detailed theoretical guarantees and time complexity analysis, followed by two acceleration strategies to further improve the efficiency of the algorithm. Experiments are conducted on five real-world spatial dataset collections to verify the efficiency and effectiveness of our algorithms.

Budgeted Spatial Data Acquisition: When Coverage and Connectivity Matter

TL;DR

The paper addresses acquiring a collection of spatial datasets under a budget while maximizing spatial coverage and ensuring connectivity, formalizing this as Budgeted Maximum Coverage with Connectivity Constraint (BMCC). It proves BMCC is NP-hard via a reduction from MCP and presents two greedy approximation algorithms, DSA and DPSA, with theoretical guarantees, plus two acceleration strategies to improve practicality. Empirical evaluation on five real-world spatial collections shows DPSA (especially with BFS acceleration) achieves strong coverage with substantial speedups, validating the approach for data marketplaces seeking connected dataset collections. The work advances spatial data acquisition by integrating coverage and connectivity into collection-level decisions under budget, with implications for transit planning, urban analytics, and geospatial marketplaces.

Abstract

Data is undoubtedly becoming a commodity like oil, land, and labor in the 21st century. Although there have been many successful marketplaces for data trading, the existing data marketplaces lack consideration of the case where buyers want to acquire a collection of datasets (instead of one), and the overall spatial coverage and connectivity matter. In this paper, we take the first attempt to formulate this problem as Budgeted Maximum Coverage with Connectivity Constraint (BMCC), which aims to acquire a dataset collection with the maximum spatial coverage under a limited budget while maintaining spatial connectivity. To solve the problem, we propose two approximate algorithms with detailed theoretical guarantees and time complexity analysis, followed by two acceleration strategies to further improve the efficiency of the algorithm. Experiments are conducted on five real-world spatial dataset collections to verify the efficiency and effectiveness of our algorithms.

Paper Structure

This paper contains 20 sections, 7 theorems, 22 equations, 15 figures, 4 tables, 2 algorithms.

Key Result

theorem 1

The BMCC problem is NP-hard.

Figures (15)

  • Figure 1: An example of dataset acquisition under a limited budget. (a) shows the individual dataset recommendation in the current marketplaces, (b) forms the collection by randomly choosing individual recommended datasets, whereas (c) directly recommends the dataset collection.
  • Figure 2: Illustration of the spatial dataset and cell-based dataset, where (a) shows a spatial dataset $D$, (b) shows how to partition the original space into a grid of uniform cells, and (c) shows a cell-based dataset $S_D$ and its spatial coverage.
  • Figure 3: Construction of spatial dataset graph, where (a) shows five cell-based datasets in $\mathcal{S}_\mathcal{D}\xspace$, (b) shows the cell-based dataset distance computation between $S_{D_1}$ and $S_{D_2}$, and (c) shows the spatial dataset graph constructed from $\mathcal{S}_{\mathcal{D}}$ and the optimal solution of $\mathcal{H}^*$.
  • Figure 4: Illustration of reduction from MCP to BMCC.
  • Figure 5: Illustration of DSA, where (a) shows 8 cell-based datasets, (b) shows the solution $\mathcal{H}_1=\{S_{D_4}, S_{D_7}, S_{D_8}\}$ found by the first-round search, and (c) shows the solution $\mathcal{H}_2 = \{S_{D_1}, S_{D_5}\}$ found by the second-round search.
  • ...and 10 more figures

Theorems & Definitions (30)

  • Definition 1
  • Definition 2
  • Definition 3
  • Example 1
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Example 2
  • theorem 1
  • ...and 20 more