Table of Contents
Fetching ...

PGMEL: Policy Gradient-based Generative Adversarial Network for Multimodal Entity Linking

KM Pooja, Cheng Long, Aixin Sun

TL;DR

PGMEL addresses multimodal entity linking by framing it as a metric-learning task within a policy-gradient GAN, where the generator creates challenging negatives and the discriminator learns robust multimodal embeddings. It integrates a gated multimodal unit to fuse text and image information and uses a triplet-style objective for the discriminator while training the generator via policy gradient with rewards from the discriminator. Evaluations on Wiki-MEL, Richpedia-MEL, and WikiDiverse show state-of-the-art performance and clear benefits from hard negative sampling and multimodal fusion, with extensive ablations validating the components. The approach promises more robust MEL in resource-constrained settings and offers a path toward leveraging additional modalities and KG structure in the future.

Abstract

The task of entity linking, which involves associating mentions with their respective entities in a knowledge graph, has received significant attention due to its numerous potential applications. Recently, various multimodal entity linking (MEL) techniques have been proposed, targeted to learn comprehensive embeddings by leveraging both text and vision modalities. The selection of high-quality negative samples can potentially play a crucial role in metric/representation learning. However, to the best of our knowledge, this possibility remains unexplored in existing literature within the framework of MEL. To fill this gap, we address the multimodal entity linking problem in a generative adversarial setting where the generator is responsible for generating high-quality negative samples, and the discriminator is assigned the responsibility for the metric learning tasks. Since the generator is involved in generating samples, which is a discrete process, we optimize it using policy gradient techniques and propose a policy gradient-based generative adversarial network for multimodal entity linking (PGMEL). Experimental results based on Wiki-MEL, Richpedia-MEL and WikiDiverse datasets demonstrate that PGMEL learns meaningful representation by selecting challenging negative samples and outperforms state-of-the-art methods.

PGMEL: Policy Gradient-based Generative Adversarial Network for Multimodal Entity Linking

TL;DR

PGMEL addresses multimodal entity linking by framing it as a metric-learning task within a policy-gradient GAN, where the generator creates challenging negatives and the discriminator learns robust multimodal embeddings. It integrates a gated multimodal unit to fuse text and image information and uses a triplet-style objective for the discriminator while training the generator via policy gradient with rewards from the discriminator. Evaluations on Wiki-MEL, Richpedia-MEL, and WikiDiverse show state-of-the-art performance and clear benefits from hard negative sampling and multimodal fusion, with extensive ablations validating the components. The approach promises more robust MEL in resource-constrained settings and offers a path toward leveraging additional modalities and KG structure in the future.

Abstract

The task of entity linking, which involves associating mentions with their respective entities in a knowledge graph, has received significant attention due to its numerous potential applications. Recently, various multimodal entity linking (MEL) techniques have been proposed, targeted to learn comprehensive embeddings by leveraging both text and vision modalities. The selection of high-quality negative samples can potentially play a crucial role in metric/representation learning. However, to the best of our knowledge, this possibility remains unexplored in existing literature within the framework of MEL. To fill this gap, we address the multimodal entity linking problem in a generative adversarial setting where the generator is responsible for generating high-quality negative samples, and the discriminator is assigned the responsibility for the metric learning tasks. Since the generator is involved in generating samples, which is a discrete process, we optimize it using policy gradient techniques and propose a policy gradient-based generative adversarial network for multimodal entity linking (PGMEL). Experimental results based on Wiki-MEL, Richpedia-MEL and WikiDiverse datasets demonstrate that PGMEL learns meaningful representation by selecting challenging negative samples and outperforms state-of-the-art methods.

Paper Structure

This paper contains 28 sections, 16 equations, 7 figures, 3 tables, 2 algorithms.

Figures (7)

  • Figure 1: Multimodal entity linking example : mention Williams is correctly linked to entity Robin Williams.
  • Figure 2: Architecture and Training of PGMEL.
  • Figure 3: Score function: PGMEL uses the same score function (with varying parameters) during training in the generator and discriminator as well as during inference.
  • Figure 4: Accuracy with the number of epochs for PGMEL and PGMEL-pretrain.
  • Figure 5: Performance evaluation with varying training set sizes.
  • ...and 2 more figures