Table of Contents
Fetching ...

Fair CCA for Fair Representation Learning: An ADNI Study

Bojian Hou, Zhanliang Wang, Zhuoping Zhou, Boning Tong, Zexuan Wang, Jingxuan Bao, Duy Duong-Tran, Qi Long, Li Shen

TL;DR

This work proposes FR-CCA, a fair representation learning method for cross-modal data that preserves alignment between modalities while removing information about a sensitive attribute to promote downstream fairness. The key idea is to enforce zero (centered) covariance between the projected features and the protected attribute by a nullspace-based construction, enabling a tractable relaxation of independence that reduces bias without sacrificing cross-modal correlations. Empirical results on synthetic data and ADNI MRI/AV1451 PET show that FR-CCA achieves superior fairness (DPG, EO, GSG) with competitive or improved classification performance, and interpretability analyses link the learned representations to meaningful brain regions. The approach offers a practical, efficient route to fair multimodal analysis in high-stakes clinical settings, supported by public code and robust cross-dataset evaluation.

Abstract

Canonical correlation analysis (CCA) is a technique for finding correlations between different data modalities and learning low-dimensional representations. As fairness becomes crucial in machine learning, fair CCA has gained attention. However, previous approaches often overlook the impact on downstream classification tasks, limiting applicability. We propose a novel fair CCA method for fair representation learning, ensuring the projected features are independent of sensitive attributes, thus enhancing fairness without compromising accuracy. We validate our method on synthetic data and real-world data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), demonstrating its ability to maintain high correlation analysis performance while improving fairness in classification tasks. Our work enables fair machine learning in neuroimaging studies where unbiased analysis is essential. Code is available in https://github.com/ZhanliangAaronWang/FR-CCA-ADNI.

Fair CCA for Fair Representation Learning: An ADNI Study

TL;DR

This work proposes FR-CCA, a fair representation learning method for cross-modal data that preserves alignment between modalities while removing information about a sensitive attribute to promote downstream fairness. The key idea is to enforce zero (centered) covariance between the projected features and the protected attribute by a nullspace-based construction, enabling a tractable relaxation of independence that reduces bias without sacrificing cross-modal correlations. Empirical results on synthetic data and ADNI MRI/AV1451 PET show that FR-CCA achieves superior fairness (DPG, EO, GSG) with competitive or improved classification performance, and interpretability analyses link the learned representations to meaningful brain regions. The approach offers a practical, efficient route to fair multimodal analysis in high-stakes clinical settings, supported by public code and robust cross-dataset evaluation.

Abstract

Canonical correlation analysis (CCA) is a technique for finding correlations between different data modalities and learning low-dimensional representations. As fairness becomes crucial in machine learning, fair CCA has gained attention. However, previous approaches often overlook the impact on downstream classification tasks, limiting applicability. We propose a novel fair CCA method for fair representation learning, ensuring the projected features are independent of sensitive attributes, thus enhancing fairness without compromising accuracy. We validate our method on synthetic data and real-world data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), demonstrating its ability to maintain high correlation analysis performance while improving fairness in classification tasks. Our work enables fair machine learning in neuroimaging studies where unbiased analysis is essential. Code is available in https://github.com/ZhanliangAaronWang/FR-CCA-ADNI.

Paper Structure

This paper contains 17 sections, 20 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of FR-CCA with the sensitive attribute sex as an example (female and male). (a)--(c) demonstrate the general framework of CCA, while (d)--(g) provide a comparison of the projected results using various strategies. It is important to note that the correlation between two corresponding samples is inversely associated with the angle formed by their projected vectors. FR-CCA aims to equalize the average angles among different groups. After projection ((h) and (k)), our FR-CCA can lead to fair classification ((j) and (m)), where the classification results will not be affected by the sex information.
  • Figure 2: Percentage change of the fairness and correlation of our method compared to baseline methods. The top row (a–c) reports results on synthetic data, and the bottom row (d–f) on ADNI data. The x-axis is the projection dimension $r$. Blue columns ($\Delta_{\mathrm{fair},r}$) are plotted against the left y-axis on a $\mathbf{\log_{10}}$ scale, while orange columns ($\Delta_{\mathrm{corr},r}$) use the right y-axis on a linear scale. Numbers above the columns report the percentage changes on their respective scales. (We expect high percentage change for fairness metric indicating the effectiveness of our method in terms of fairness and low percentage change on correlation demonstrating small or even no loss in terms of correlation metric.)
  • Figure 3: Comparison plot for fairness metrics (DPG, EOG, GSG) and Accuracy on four modalities (X and Y for synthetic data, MRI and AV1451 for ADNI data). We use boxplots to illustrate the results for three fairness metrics as they exhibit significant variation across different methods. We use boxplots to display the accuracy, as there is minimal variation across different methods. The green triangle represents the mean value over five runs in the boxplots, and the dark diamonds refer to outliers. Our FR-CCA outperforms all the baseline models regarding three fairness metrics (the smaller, the better) while also achieving competitive accuracy results (the larger, the better).
  • Figure 4: Brain heat map of feature importance for ADNI MRI and ADNI AV1451 using the coefficient of our FR-CCA model. Each modality includes three slices to display the brain region extensively. The heat maps in the first row showcase all the brain regions of interest (ROI), where darker colors represent greater importance for risk prediction. The second row highlights the top ten significant brain regions in each modality, which are annotated in the legends in the right column.

Theorems & Definitions (1)

  • Definition 1