Multimodal Representation Learning using Adaptive Graph Construction

Weichen Huang

Multimodal Representation Learning using Adaptive Graph Construction

Weichen Huang

TL;DR

This work proposes AutoBIND, a novel contrastive learning framework that can learn representations from an arbitrary number of modalites through graph optimization and shows that AutoBIND outperforms previous methods on Alzhiemer's disease detection, highlighting the generalizablility of the approach.

Abstract

Multimodal contrastive learning train neural networks by levergaing data from heterogeneous sources such as images and text. Yet, many current multimodal learning architectures cannot generalize to an arbitrary number of modalities and need to be hand-constructed. We propose AutoBIND, a novel contrastive learning framework that can learn representations from an arbitrary number of modalites through graph optimization. We evaluate AutoBIND on Alzhiemer's disease detection because it has real-world medical applicability and it contains a broad range of data modalities. We show that AutoBIND outperforms previous methods on this task, highlighting the generalizablility of the approach.

Multimodal Representation Learning using Adaptive Graph Construction

TL;DR

Abstract

Paper Structure (9 sections, 1 equation, 2 figures, 1 table, 1 algorithm)

This paper contains 9 sections, 1 equation, 2 figures, 1 table, 1 algorithm.

Introduction
Problem Setup
Proposed Methods
Graph Construction
Graph Update
Experiments
Dataset
Results
Discussion

Figures (2)

Figure 1: Overview of the AutoBIND Process: Illustration depicting the various stages and steps involved in the AutoBIND framework. The process encompasses multimodal embedding and graph construction, resulting in enhanced performance across different datasets.
Figure 2: Performance of AutoBIND MST vs. AutoBIND FCG vs Unimodals.

Multimodal Representation Learning using Adaptive Graph Construction

TL;DR

Abstract

Multimodal Representation Learning using Adaptive Graph Construction

Authors

TL;DR

Abstract

Table of Contents

Figures (2)