Table of Contents
Fetching ...

A Review of Graph-Powered Data Quality Applications for IoT Monitoring Sensor Networks

Pau Ferrer-Cid, Jose M. Barcelo-Ordinas, Jorge Garcia-Vidal

TL;DR

This survey addresses the problem of data quality in IoT monitoring sensor networks by aggregating graph-based techniques across three main paradigms: Graph Signal Processing, ML over graphs, and Graph Neural Networks. It presents a taxonomy of data quality tasks—including missing value imputation, anomaly detection, virtual sensing, and clustering—organized by network, edge, and node levels, and discusses how graph topology, via operators like the shift $S$ and the Laplacian $L$, enables effective, often distributed, solutions. Key contributions include a concise synthesis of fundamentals, a structured overview of data quality applications, and a critical discussion of challenges (scale, heterogeneity, and security) plus emerging trends (transferability, federated learning, digital twins, and quantum-inspired models). Theoretical and practical insights highlight how graph-based methods can improve data completeness, accuracy, and consistency in IoT systems, ultimately supporting reliable decision-making and digital twin deployments.

Abstract

The development of Internet of Things (IoT) technologies has led to the widespread adoption of monitoring networks for a wide variety of applications, such as smart cities, environmental monitoring, and precision agriculture. A major research focus in recent years has been the development of graph-based techniques to improve the quality of data from sensor networks, a key aspect for the use of sensed data in decision-making processes, digital twins, and other applications. Emphasis has been placed on the development of machine learning and signal processing techniques over graphs, taking advantage of the benefits provided by the use of structured data through a graph topology. Many technologies such as the graph signal processing (GSP) or the successful graph neural networks (GNNs) have been used for data quality enhancement tasks. In this survey, we focus on graph-based models for data quality control in monitoring sensor networks. Furthermore, we delve into the technical details that are commonly leveraged for providing powerful graph-based solutions for data quality tasks in sensor networks, including missing value imputation, outlier detection, or virtual sensing. To conclude, we have identified future trends and challenges such as graph-based models for digital twins or model transferability and generalization.

A Review of Graph-Powered Data Quality Applications for IoT Monitoring Sensor Networks

TL;DR

This survey addresses the problem of data quality in IoT monitoring sensor networks by aggregating graph-based techniques across three main paradigms: Graph Signal Processing, ML over graphs, and Graph Neural Networks. It presents a taxonomy of data quality tasks—including missing value imputation, anomaly detection, virtual sensing, and clustering—organized by network, edge, and node levels, and discusses how graph topology, via operators like the shift and the Laplacian , enables effective, often distributed, solutions. Key contributions include a concise synthesis of fundamentals, a structured overview of data quality applications, and a critical discussion of challenges (scale, heterogeneity, and security) plus emerging trends (transferability, federated learning, digital twins, and quantum-inspired models). Theoretical and practical insights highlight how graph-based methods can improve data completeness, accuracy, and consistency in IoT systems, ultimately supporting reliable decision-making and digital twin deployments.

Abstract

The development of Internet of Things (IoT) technologies has led to the widespread adoption of monitoring networks for a wide variety of applications, such as smart cities, environmental monitoring, and precision agriculture. A major research focus in recent years has been the development of graph-based techniques to improve the quality of data from sensor networks, a key aspect for the use of sensed data in decision-making processes, digital twins, and other applications. Emphasis has been placed on the development of machine learning and signal processing techniques over graphs, taking advantage of the benefits provided by the use of structured data through a graph topology. Many technologies such as the graph signal processing (GSP) or the successful graph neural networks (GNNs) have been used for data quality enhancement tasks. In this survey, we focus on graph-based models for data quality control in monitoring sensor networks. Furthermore, we delve into the technical details that are commonly leveraged for providing powerful graph-based solutions for data quality tasks in sensor networks, including missing value imputation, outlier detection, or virtual sensing. To conclude, we have identified future trends and challenges such as graph-based models for digital twins or model transferability and generalization.

Paper Structure

This paper contains 34 sections, 18 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 1: Examples of different applications where IoT sensor networks are leveraged to monitor different phenomena. Recent use cases include applications in air quality networks, precision agriculture, or pipe networks among others.
  • Figure 2: Outline of the survey. Section \ref{['sec:technical_part']} introduces the basics of the different graph-based approaches (GSP, ML over graphs, and GNNs) on which the methods discussed in section \ref{['sec:applications']} are based.
  • Figure 3: Scope of the review, focus on graph-based models (GSP, ML over graphs, and GNNs) for data quality tasks in IoT monitoring sensor networks.
  • Figure 4: Example of how a graph $\mathcal{G}=\{\mathcal{V},\mathcal{E},\mathbf{S}\}$ can be used to represent a sensor network and the sensors' relationships. $e_{ij}$ represents an edge connecting nodes $i$ and $j$. The entry of the graph shift matrix $S_{ij}$ assigns a weight to the edge $e_{ij}$.
  • Figure 5: Types of graphs: above, static graphics; below, dynamic graphics; $u$ denotes undirected, $d$ denotes directed, $m$ denotes multigraph, and $\mathcal{G}_u^{(t)}$ denotes a dynamic undirected graph at time $t$. $T{\in}\mathbb{N}$ represents a time offset.
  • ...and 10 more figures