Table of Contents
Fetching ...

CINDI: Conditional Imputation and Noisy Data Integrity with Flows in Power Grid Data

David Baumgartner, Helge Langseth, Heri Ramampiaro

TL;DR

Conditional Imputation and Noisy Data Integrity (CINDI), an unsupervised probabilistic framework designed to restore data integrity in complex time series, is introduced, offering a scalable solution for maintaining reliability in noisy environments.

Abstract

Real-world multivariate time series, particularly in critical infrastructure such as electrical power grids, are often corrupted by noise and anomalies that degrade the performance of downstream tasks. Standard data cleaning approaches often rely on disjoint strategies, which involve detecting errors with one model and imputing them with another. Such approaches can fail to capture the full joint distribution of the data and ignore prediction uncertainty. This work introduces Conditional Imputation and Noisy Data Integrity (CINDI), an unsupervised probabilistic framework designed to restore data integrity in complex time series. Unlike fragmented approaches, CINDI unifies anomaly detection and imputation into a single end-to-end system built on conditional normalizing flows. By modeling the exact conditional likelihood of the data, the framework identifies low-probability segments and iteratively samples statistically consistent replacements. This allows CINDI to efficiently reuse learned information while preserving the underlying physical and statistical properties of the system. We evaluate the framework using real-world grid loss data from a Norwegian power distribution operator, though the methodology is designed to generalize to any multivariate time series domain. The results demonstrate that CINDI yields robust performance compared to competitive baselines, offering a scalable solution for maintaining reliability in noisy environments.

CINDI: Conditional Imputation and Noisy Data Integrity with Flows in Power Grid Data

TL;DR

Conditional Imputation and Noisy Data Integrity (CINDI), an unsupervised probabilistic framework designed to restore data integrity in complex time series, is introduced, offering a scalable solution for maintaining reliability in noisy environments.

Abstract

Real-world multivariate time series, particularly in critical infrastructure such as electrical power grids, are often corrupted by noise and anomalies that degrade the performance of downstream tasks. Standard data cleaning approaches often rely on disjoint strategies, which involve detecting errors with one model and imputing them with another. Such approaches can fail to capture the full joint distribution of the data and ignore prediction uncertainty. This work introduces Conditional Imputation and Noisy Data Integrity (CINDI), an unsupervised probabilistic framework designed to restore data integrity in complex time series. Unlike fragmented approaches, CINDI unifies anomaly detection and imputation into a single end-to-end system built on conditional normalizing flows. By modeling the exact conditional likelihood of the data, the framework identifies low-probability segments and iteratively samples statistically consistent replacements. This allows CINDI to efficiently reuse learned information while preserving the underlying physical and statistical properties of the system. We evaluate the framework using real-world grid loss data from a Norwegian power distribution operator, though the methodology is designed to generalize to any multivariate time series domain. The results demonstrate that CINDI yields robust performance compared to competitive baselines, offering a scalable solution for maintaining reliability in noisy environments.
Paper Structure (22 sections, 4 equations, 14 figures, 2 tables)

This paper contains 22 sections, 4 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Overview of our CINDI framework based on normalizing flows. The framework alternates between the two states (green and blue), indicated by the orange path, of training and data improvement until convergence. At this point, no further changes are made, and an improved dataset is available, as indicated by the black dotted path. In the green state, CINDI uses the current dataset and outputs a normalizing flow model. The following blue state uses this model to identify data points that deviate from the expected behavior and then corrects them by generating plausible replacements. This process leads to convergence away from detecting unexpected behavior, resulting in improved data for another task.
  • Figure 2: Comparing expected (left) and unexpected (right) behavior in power grid measurements, showing both grid loss and power consumption. The top row displays the full time series, while the bottom row zooms in on specific sections for a closer look. The expected behavior on the left is mostly normal with a few unusual spikes. In contrast, the behavior on the right starts normally but then shifts to consistently unusual readings, a change that coincides with the start of daylight saving time.
  • Figure 3: VUS performance results for CINDI and baselines with $1.04\%$ errors in the training data. Points are the final model performance after model selection, and box plots show all the tested candidate solutions.
  • Figure 4: Results from the second iteration on the dataset with $1.04\%$ noise. Fig. (a), (b) show self-regressing imputation of two flagged sections, where the heatmap indicates the negative log-likelihood of possible other samples.
  • Figure 5: Results from the second iteration on the dataset with $1.04\%$ noise. Fig. (a), (b) show self-regressing imputation of two flagged sections, with a heatmap indicating the negative log-likelihood of possible options. Fig. (c) shows the reconstruction of expected data with its negative log-likelihood. Fig. (d) and (e) display test data, detected anomalies, and latent space.
  • ...and 9 more figures