Table of Contents
Fetching ...

Frequency-domain alignment of heterogeneous, multidimensional separations data through complex orthogonal Procrustes analysis

Michael Sorochan Armstrong

TL;DR

The paper tackles peak-drift alignment in heterogeneous, multidimensional chromatography data by transforming the data into the frequency domain via a 2D FFT and applying a complex orthogonal Procrustes alignment, solving for $\Omega$ with $\Omega = U V^H$ to map distorted representations to a target. It introduces a logarithmic distortion step to simulate shifts while preserving topology, enabling robust alignment under challenging conditions. Empirical results on synthetic data show near-perfect cosine similarity between aligned and original data ($\rho_{\cos}$ close to 1 across noise, extra peaks, and peak-broadening scenarios). The approach offers a simple preprocessing route that can facilitate downstream multivariate deconvolution methods like PARAFAC2 and support analyses even without mass spectrometric detectors, albeit with substantial memory demands for the rotation computation.

Abstract

Multidimensional separations data have the capacity to reveal detailed information about complex biological samples. However, data analysis has been an ongoing challenge in the area since the peaks that represent chemical factors may drift over the course of several analytical runs along the first and second dimension retention times. This makes higher-level analyses of the data difficult, since a 1-1 comparison of samples is seldom possible without sophisticated pre-processing routines. Further complicating the issue is the fact that closely co-eluting components will need to be resolved, typically using some variants of Parallel Factor Analysis (PARAFAC), Multivariate Curve Resolution (MCR), or the recently explored Shift-Invariant Multi-linearity. These algorithms work with a user-specified number of components, and regions of interest that are then summarized as a peak table that is invariant to shift. However, identifying regions of interest across truly heterogeneous data remains an ongoing issue, for automated deployment of these algorithms. This work offers a very simple solution to the alignment problem through a orthogonal Procrustes analysis of the frequency-domain representation of synthetic multidimensional separations data, for peaks that are logarithmically transformed to simulate shift while preserving the underlying topology of the data. Using this very simple method for analysis, two synthetic chromatograms can be compared under close to the worst possible scenarios for alignment.

Frequency-domain alignment of heterogeneous, multidimensional separations data through complex orthogonal Procrustes analysis

TL;DR

The paper tackles peak-drift alignment in heterogeneous, multidimensional chromatography data by transforming the data into the frequency domain via a 2D FFT and applying a complex orthogonal Procrustes alignment, solving for with to map distorted representations to a target. It introduces a logarithmic distortion step to simulate shifts while preserving topology, enabling robust alignment under challenging conditions. Empirical results on synthetic data show near-perfect cosine similarity between aligned and original data ( close to 1 across noise, extra peaks, and peak-broadening scenarios). The approach offers a simple preprocessing route that can facilitate downstream multivariate deconvolution methods like PARAFAC2 and support analyses even without mass spectrometric detectors, albeit with substantial memory demands for the rotation computation.

Abstract

Multidimensional separations data have the capacity to reveal detailed information about complex biological samples. However, data analysis has been an ongoing challenge in the area since the peaks that represent chemical factors may drift over the course of several analytical runs along the first and second dimension retention times. This makes higher-level analyses of the data difficult, since a 1-1 comparison of samples is seldom possible without sophisticated pre-processing routines. Further complicating the issue is the fact that closely co-eluting components will need to be resolved, typically using some variants of Parallel Factor Analysis (PARAFAC), Multivariate Curve Resolution (MCR), or the recently explored Shift-Invariant Multi-linearity. These algorithms work with a user-specified number of components, and regions of interest that are then summarized as a peak table that is invariant to shift. However, identifying regions of interest across truly heterogeneous data remains an ongoing issue, for automated deployment of these algorithms. This work offers a very simple solution to the alignment problem through a orthogonal Procrustes analysis of the frequency-domain representation of synthetic multidimensional separations data, for peaks that are logarithmically transformed to simulate shift while preserving the underlying topology of the data. Using this very simple method for analysis, two synthetic chromatograms can be compared under close to the worst possible scenarios for alignment.

Paper Structure

This paper contains 9 sections, 21 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Performance of the FFT-Procrustes analysis. Row 1 shows the original, shifted, and aligned image without any noise. Row 2 shows the original, shifted and aligned image with 10% Gaussian noise as a function of the maximum peak intensity. Row 3 shows the results with extra peaks added to the shifted image, and on row 4 the shifted peaks have their widths increased by 50%.