Table of Contents
Fetching ...

ICS for complex data with application to outlier detection for density data

Camille Mondon, Huong Thi Trinh, Anne Ruiz-Gazen, Christine Thomas-Agnan

TL;DR

This work extends invariant coordinate selection (ICS) to complex data via a coordinate-free framework in finite-dimensional Euclidean spaces, enabling unified treatment of compositional, functional, and distributional data. By introducing coordinate-free definitions and weighted covariance operators $Cov_w$, the authors provide practical implementations that map complex objects to Euclidean coordinates for standard ICS, including efficient compositional and Bayes-space approaches. The paper develops an ICS-based outlier-detection procedure, discusses preprocessing parameter effects, and validates the method through simulations and an application to Vietnamese climate-density data, highlighting robustness and competitive performance in low-outlier regimes. The framework enables interpretable detection of atypical density patterns and offers a path toward extending ICS to broader infinite-dimensional settings, with ongoing work on alternative scatter operators and aggregation of preprocessing results.

Abstract

Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.

ICS for complex data with application to outlier detection for density data

TL;DR

This work extends invariant coordinate selection (ICS) to complex data via a coordinate-free framework in finite-dimensional Euclidean spaces, enabling unified treatment of compositional, functional, and distributional data. By introducing coordinate-free definitions and weighted covariance operators , the authors provide practical implementations that map complex objects to Euclidean coordinates for standard ICS, including efficient compositional and Bayes-space approaches. The paper develops an ICS-based outlier-detection procedure, discusses preprocessing parameter effects, and validates the method through simulations and an application to Vietnamese climate-density data, highlighting robustness and competitive performance in low-outlier regimes. The framework enables interpretable detection of atypical density patterns and offers a path toward extending ICS to broader infinite-dimensional settings, with ongoing work on alternative scatter operators and aggregation of preprocessing results.

Abstract

Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.

Paper Structure

This paper contains 26 sections, 5 theorems, 28 equations, 16 figures.

Key Result

Proposition 1

Let $\varphi : (E, \langle \cdot, \cdot \rangle_E) \rightarrow (F, \langle \cdot, \cdot \rangle_F)$ be an isometry between two Euclidean spaces of dimension $p$, $\mathcal{E} \subseteq L^1 (\Omega, E)$ an affine invariant set of integrable $E$-valued random objects, $S_1^{\mathcal{E}}$ and $S_2^{\ma For any $E$-valued random object $X^{\mathcal{E}} \in \mathcal{E}$, any basis $H^{\mathcal{E}} = (h

Figures (16)

  • Figure 1: Map of Vietnam showing the 63 provinces, with the three regions under study colour-coded. The 28 provinces included in the toy example are labelled.
  • Figure 2:
  • Figure 3:
  • Figure 5: Outlier detection by ICS across smoothing parameters for the Vietnam toy example. Top: knots at quantiles; Bottom: equally spaced knots. $y$-axis: observation indices; $x$-axis:$\lambda$ parameter. Columns correspond to knot numbers (0-35). Outliers are dark and colour-coded by region.
  • Figure 6: Frequency of outlier detection by ICS across 340 scenarios with varying smoothing parameters, for each observation in the Vietnam toy example.
  • ...and 11 more figures

Theorems & Definitions (18)

  • Definition 1: Coordinate-free ICS
  • Remark : Multivariate case
  • Proposition 1
  • Definition 2: Weighted covariance operators
  • Example 1
  • Example 2
  • Corollary 1
  • Corollary 2
  • Remark : Empirical ICS and estimation
  • Definition 3: Scatter operators
  • ...and 8 more