Table of Contents
Fetching ...

DROP: Distributionally Robust Optimization for Multi-task Learning in Graphical Models

Canruo Shen, Xintong Ji, Qiong Li, Wenzhi Yang, Xiaoping Shi

Abstract

Gaussian Graphical Models (GGMs) are widely used to infer conditional dependence structures in high-dimensional data. However, standard precision matrix estimators are highly sensitive to data contamination, such as extreme outliers and heavy-tailed noise. In this paper, we propose DROP (Distributionally Robust Optimization), a robust estimation method formulated within a multi-task nodewise regression framework. The proposed estimator enforces structural sparsity while resisting the influence of corrupted observations. Theoretically, we establish error bounds for the DROP estimator under general contamination. Through extensive high-dimensional simulations, we demonstrate that DROP consistently controls the rate of false positive edges and outperforms conventional non-robust estimators when data deviate from standard Gaussian assumptions. Furthermore, in a functional MRI (fMRI) application, DROP maintains a stable graph structure and preserves network modularity even when subjected to severe data perturbations, whereas competing methods yield excessively dense networks. To facilitate reproducible research, the DROP R package will be made publicly available on GitHub.

DROP: Distributionally Robust Optimization for Multi-task Learning in Graphical Models

Abstract

Gaussian Graphical Models (GGMs) are widely used to infer conditional dependence structures in high-dimensional data. However, standard precision matrix estimators are highly sensitive to data contamination, such as extreme outliers and heavy-tailed noise. In this paper, we propose DROP (Distributionally Robust Optimization), a robust estimation method formulated within a multi-task nodewise regression framework. The proposed estimator enforces structural sparsity while resisting the influence of corrupted observations. Theoretically, we establish error bounds for the DROP estimator under general contamination. Through extensive high-dimensional simulations, we demonstrate that DROP consistently controls the rate of false positive edges and outperforms conventional non-robust estimators when data deviate from standard Gaussian assumptions. Furthermore, in a functional MRI (fMRI) application, DROP maintains a stable graph structure and preserves network modularity even when subjected to severe data perturbations, whereas competing methods yield excessively dense networks. To facilitate reproducible research, the DROP R package will be made publicly available on GitHub.
Paper Structure (35 sections, 4 theorems, 57 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 35 sections, 4 theorems, 57 equations, 9 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Fix $\kappa \in (1,\infty)$ and let $\kappa^*$ satisfy $1/\kappa + 1/\kappa^* = 1$. Consider a Gaussian graphical model with precision matrix $K \succ 0$, and let $\{\boldsymbol{\beta}^{(i)}(K)\}_{i=1}^p$ denote the corresponding neighborhood selection coefficients. Assume that the distributional un where $\mathrm{MSE}_n(\boldsymbol{\beta}^{(i)}(K)) = \frac{1}{n} \sum_{k=1}^{n} \left( x_{k,i} - \b

Figures (9)

  • Figure 1: A motivating example illustrating the impact of data contamination on a sparse graph with $p=10$ nodes. (A) The true conditional dependence structure. (B) and (D): Graphs estimated by the proposed DROP method under heavy-tailed noise and leverage points, respectively. (C) and (E): Graphs estimated by the standard GLASSO method under the same contamination scenarios, showing an increase in false positive edges.
  • Figure 2: Illustration of the five graph structures with $p = 100$ nodes. (A) Band graph; (B) Hub graph; (C) Cluster graph; (D) Random graph; (E) Scale-free graph.
  • Figure 3: F1 Score comparison of the band graph with $p = 20$ nodes across contamination scenarios.
  • Figure 4: F1 score comparison across five graph structures. Left: $p=100$ ($n=1000$). Right: $p=250$ ($n=500$).
  • Figure 5: Workflow of fMRI data extraction from raw images to region-wise BOLD time series based on the Power-264 parcellation (created in https://BioRender.com)
  • ...and 4 more figures

Theorems & Definitions (12)

  • Proposition 1: Distributionally Robust Scalarized Multi-Task GGM
  • Remark 2
  • Theorem 3
  • Remark 4
  • Remark 5
  • Remark 6
  • Remark 7
  • Remark 8
  • Remark 9
  • Lemma 10
  • ...and 2 more