Table of Contents
Fetching ...

RKUM: An R Package for Robust Kernel Unsupervised Methods

Md Ashad Alam

TL;DR

The paper addresses robustness in kernel-based unsupervised learning, proposing robust estimators for the kernel covariance and cross-covariance operators via M-estimation with generalized losses. It develops robust kernel CCA by replacing standard operators with their robust counterparts and provides influence-function analyses to quantify sensitivity to outliers. The RKUM package implements these methods (including Hub er and Hampel losses) using kernel iteratively reweighted least squares, enabling reliable analysis in high-dimensional, multi-view data and facilitating outlier detection and robustness evaluation. The approach preserves the ability to detect true dependencies while mitigating contamination effects, with practical impact for multimodal data integration and exploratory data analysis in noisy settings.

Abstract

RKUM is an R package developed for implementing robust kernel-based unsupervised methods. It provides functions for estimating the robust kernel covariance operator (CO) and the robust kernel cross-covariance operator (CCO) using generalized loss functions instead of the conventional quadratic loss. These operators form the foundation of robust kernel learning and enable reliable analysis under contaminated or noisy data conditions. The package includes implementations of robust kernel canonical correlation analysis (Kernel CCA), as well as the influence function (IF) for both standard and multiple kernel CCA frameworks. The influence function quantifies sensitivity and helps detect influential or outlying observations across two-view and multi-view datasets. Experiments using synthesized two-view and multi-view data demonstrate that the IF of the standard kernel CCA effectively identifies outliers, while the robust kernel methods implemented in RKUM exhibit reduced sensitivity to contamination. Overall, RKUM provides an efficient and extensible platform for robust kernel-based analysis in high-dimensional data applications.

RKUM: An R Package for Robust Kernel Unsupervised Methods

TL;DR

The paper addresses robustness in kernel-based unsupervised learning, proposing robust estimators for the kernel covariance and cross-covariance operators via M-estimation with generalized losses. It develops robust kernel CCA by replacing standard operators with their robust counterparts and provides influence-function analyses to quantify sensitivity to outliers. The RKUM package implements these methods (including Hub er and Hampel losses) using kernel iteratively reweighted least squares, enabling reliable analysis in high-dimensional, multi-view data and facilitating outlier detection and robustness evaluation. The approach preserves the ability to detect true dependencies while mitigating contamination effects, with practical impact for multimodal data integration and exploratory data analysis in noisy settings.

Abstract

RKUM is an R package developed for implementing robust kernel-based unsupervised methods. It provides functions for estimating the robust kernel covariance operator (CO) and the robust kernel cross-covariance operator (CCO) using generalized loss functions instead of the conventional quadratic loss. These operators form the foundation of robust kernel learning and enable reliable analysis under contaminated or noisy data conditions. The package includes implementations of robust kernel canonical correlation analysis (Kernel CCA), as well as the influence function (IF) for both standard and multiple kernel CCA frameworks. The influence function quantifies sensitivity and helps detect influential or outlying observations across two-view and multi-view datasets. Experiments using synthesized two-view and multi-view data demonstrate that the IF of the standard kernel CCA effectively identifies outliers, while the robust kernel methods implemented in RKUM exhibit reduced sensitivity to contamination. Overall, RKUM provides an efficient and extensible platform for robust kernel-based analysis in high-dimensional data applications.

Paper Structure

This paper contains 11 sections, 1 theorem, 34 equations, 2 figures.

Key Result

Theorem 4.1

Given two sets of random variables $(X, Y)$ having the distribution $F_{XY}$ and the j-th kernel CC ( $\rho_j$) and kernel CVs ($f_{jX}(X)$ and $f_{jX}(Y)$), the influence functions of kernel CC and kernel CVs at $Z^\prime = (X^\prime, Y^\prime)$ are

Figures (2)

  • Figure 1: Comparison of influence value profiles across multiple kernel CCA methods for ideal and contaminated datasets using the RKUM package..
  • Figure 2: Line plots of influence values obtained from multiple kernel CCA methods applied to ideal and contaminated datasets using the RKUM package.

Theorems & Definitions (1)

  • Theorem 4.1