FusionAdapter for Few-Shot Relation Learning in Multimodal Knowledge Graphs
Ran Liu, Yuan Fang, Xiaoli Li
TL;DR
FusionAdapter tackles few-shot relation learning in multimodal knowledge graphs by introducing per-modality adapters and a diversity-preserving fusion mechanism. The approach maintains modality-specific information while integrating text and image signals through a lightweight adapter design, enabling rapid adaptation to unseen relations within a meta-learning framework. Empirical results on two MMKG benchmarks show strong, consistent improvements over both unimodal and multimodal baselines, driven by the diversity loss and parameter-efficient adapters. This work advances practical multimodal few-shot reasoning in knowledge graphs with notable gains in generalization and robustness.
Abstract
Multimodal Knowledge Graphs (MMKGs) incorporate various modalities, including text and images, to enhance entity and relation representations. Notably, different modalities for the same entity often present complementary and diverse information. However, existing MMKG methods primarily align modalities into a shared space, which tends to overlook the distinct contributions of specific modalities, limiting their performance particularly in low-resource settings. To address this challenge, we propose FusionAdapter for the learning of few-shot relationships (FSRL) in MMKG. FusionAdapter introduces (1) an adapter module that enables efficient adaptation of each modality to unseen relations and (2) a fusion strategy that integrates multimodal entity representations while preserving diverse modality-specific characteristics. By effectively adapting and fusing information from diverse modalities, FusionAdapter improves generalization to novel relations with minimal supervision. Extensive experiments on two benchmark MMKG datasets demonstrate that FusionAdapter achieves superior performance over state-of-the-art methods.
