Table of Contents
Fetching ...

How to Understand Named Entities: Using Common Sense for News Captioning

Ning Xu, Yanhui Wang, Tingting Zhang, Hongshuo Tian, Mohan Kankanhalli, An-An Liu

TL;DR

Commonsense knowledge is exploited to understand named entities for news captioning by correlating the news content with commonsense in the wild, which helps an agent to distinguish semantically similar named entities and describe named entities using words outside of training corpora.

Abstract

News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to understand named entities for news captioning. By ``understand'', we mean correlating the news content with common sense in the wild, which helps an agent to 1) distinguish semantically similar named entities and 2) describe named entities using words outside of training corpora. Our approach consists of three modules: (a) Filter Module aims to clarify the common sense concerning a named entity from two aspects: what does it mean? and what is it related to?, which divide the common sense into explanatory knowledge and relevant knowledge, respectively. (b) Distinguish Module aggregates explanatory knowledge from node-degree, dependency, and distinguish three aspects to distinguish semantically similar named entities. (c) Enrich Module attaches relevant knowledge to named entities to enrich the entity description by commonsense information (e.g., identity and social position). Finally, the probability distributions from both modules are integrated to generate the news captions. Extensive experiments on two challenging datasets (i.e., GoodNews and NYTimes) demonstrate the superiority of our method. Ablation studies and visualization further validate its effectiveness in understanding named entities.

How to Understand Named Entities: Using Common Sense for News Captioning

TL;DR

Commonsense knowledge is exploited to understand named entities for news captioning by correlating the news content with commonsense in the wild, which helps an agent to distinguish semantically similar named entities and describe named entities using words outside of training corpora.

Abstract

News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to understand named entities for news captioning. By ``understand'', we mean correlating the news content with common sense in the wild, which helps an agent to 1) distinguish semantically similar named entities and 2) describe named entities using words outside of training corpora. Our approach consists of three modules: (a) Filter Module aims to clarify the common sense concerning a named entity from two aspects: what does it mean? and what is it related to?, which divide the common sense into explanatory knowledge and relevant knowledge, respectively. (b) Distinguish Module aggregates explanatory knowledge from node-degree, dependency, and distinguish three aspects to distinguish semantically similar named entities. (c) Enrich Module attaches relevant knowledge to named entities to enrich the entity description by commonsense information (e.g., identity and social position). Finally, the probability distributions from both modules are integrated to generate the news captions. Extensive experiments on two challenging datasets (i.e., GoodNews and NYTimes) demonstrate the superiority of our method. Ablation studies and visualization further validate its effectiveness in understanding named entities.
Paper Structure (22 sections, 23 equations, 7 figures, 6 tables)

This paper contains 22 sections, 23 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Comparison of the general image caption and the news image caption. The latter can produce more expressive descriptions with specific people, organizations, and places (i.e., named entities).
  • Figure 2: Comparison of current methods and our method. Current methods generate news captions only relying on images and news articles. By comparison, we additionally use the external common sense and divide it into explanatory and relevant knowledge. The former is used to distinguish semantically similar named entities, while the latter is to provide more relevant semantics for the named entity description.
  • Figure 3: Overview of the proposed method. We design three communicative modules to exploit commonsense knowledge for the named entity understanding: (a) Filter Module queries each named entity of the news article in ConceptNet and obtains the commonsense knowledge, which is divided into explanatory knowledge and relevant knowledge for the subsequent modules. (b) Distinguish Module enhances the entity representations by aggregating the explanatory knowledge from node-degree, dependency, distinguish three aspects, which benefits distinguishing semantically similar named entities (e.g., "Mr. Gates" and "Mr. Sarkozy"). (c) Enrich Module models the commonsense-entity interaction based on the relevant knowledge to attach the reliable commonsense concepts for enriching the entity description (e.g., "Bill Gates, chairman of Microsoft"). Finally, the probability distributions from both modules are integrated to generate the news caption.
  • Figure 4: The filter module first queries each named entity in ConceptNet to extract the commonsense knowledge, which is then divided into two sub-graphs by the high-level relation types.
  • Figure 5: Procedure of Distinguish Module. It aggregates explanatory sub-graph from node-degree, dependency, and distinguish aspects to enhance the entity representation. Green denotes the named entity and orange refers to the named entity's explanatory concepts (Best viewed in color).
  • ...and 2 more figures