Table of Contents
Fetching ...

Harmonized Tabular-Image Fusion via Gradient-Aligned Alternating Learning

Longfei Huang, Yang Yang

Abstract

Multimodal tabular-image fusion is an emerging task that has received increasing attention in various domains. However, existing methods may be hindered by gradient conflicts between modalities, misleading the optimization of the unimodal learner. In this paper, we propose a novel Gradient-Aligned Alternating Learning (GAAL) paradigm to address this issue by aligning modality gradients. Specifically, GAAL adopts an alternating unimodal learning and shared classifier to decouple the multimodal gradient and facilitate interaction. Furthermore, we design uncertainty-based cross-modal gradient surgery to selectively align cross-modal gradients, thereby steering the shared parameters to benefit all modalities. As a result, GAAL can provide effective unimodal assistance and help boost the overall fusion performance. Empirical experiments on widely used datasets reveal the superiority of our method through comparison with various state-of-the-art (SoTA) tabular-image fusion baselines and test-time tabular missing baselines. The source code is available at https://github.com/njustkmg/ICME26-GAAL.

Harmonized Tabular-Image Fusion via Gradient-Aligned Alternating Learning

Abstract

Multimodal tabular-image fusion is an emerging task that has received increasing attention in various domains. However, existing methods may be hindered by gradient conflicts between modalities, misleading the optimization of the unimodal learner. In this paper, we propose a novel Gradient-Aligned Alternating Learning (GAAL) paradigm to address this issue by aligning modality gradients. Specifically, GAAL adopts an alternating unimodal learning and shared classifier to decouple the multimodal gradient and facilitate interaction. Furthermore, we design uncertainty-based cross-modal gradient surgery to selectively align cross-modal gradients, thereby steering the shared parameters to benefit all modalities. As a result, GAAL can provide effective unimodal assistance and help boost the overall fusion performance. Empirical experiments on widely used datasets reveal the superiority of our method through comparison with various state-of-the-art (SoTA) tabular-image fusion baselines and test-time tabular missing baselines. The source code is available at https://github.com/njustkmg/ICME26-GAAL.

Paper Structure

This paper contains 17 sections, 10 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: We visualize the gradient conflicts and evaluate the performance of existing multimodal solutions on DVM dataset. (a) Multimodal and image gradients often show negative cosine similarity in naive joint learning, indicating severe gradient conflicts. (b) Existing multimodal solutions underperform in tabular-image fusion, failing to fully exploit unimodal image potential.
  • Figure 2: The framework of the GAAL method. (a) Gradient-Aligned Alternating Learning that only one modality's learner is updated at each step. Uncertainty-based cross-modal gradient surgery utilizes gradients from cross-modal hard samples to guide the optimization of the shared classifier for the current modality. (b) Uncertainty-based Gradient Guidance that samples hard examples from another modality to provide cross-modal gradient guidance.
  • Figure 3: Sensitivity to hyper-parameter $\lambda ^I$ and $\lambda ^T$ on DVM and SUNAttribute datasets.
  • Figure 4: Convergence results of GAAL on DVM and SUNAttribute datasets.
  • Figure 5: Impact of constraint margin $\epsilon$.