Table of Contents
Fetching ...

Rule-driven News Captioning

Ning Xu, Tingting Zhang, Hongshuo Tian, An-An Liu

TL;DR

This work tackles the challenge of generating news captions that comply with journalistic rules by introducing a rule-driven framework. It constructs a news-aware semantic rule that links the image’s primary action to affiliated named entities extracted from the article, and injects this rule into a large pre-trained model (BART) using prefix-tuning across the last three encoder layers. The method leverages NER, a fine-tuned CLIP for entity-visual alignment, and situation-recognition-based semantic roles to produce captions that accurately describe who did what, where, and with which entities. Empirical results on GoodNews and NYTimes800k show competitive CIDEr scores and improved named-entity recall, along with thorough ablations and qualitative analyses validating the rule-guided generation. The approach offers practical benefits for reliable, policy-compliant news captioning in applications like search indexing and AI-assisted newsrooms.

Abstract

News captioning task aims to generate sentences by describing named entities or concrete events for an image with its news article. Existing methods have achieved remarkable results by relying on the large-scale pre-trained models, which primarily focus on the correlations between the input news content and the output predictions. However, the news captioning requires adhering to some fundamental rules of news reporting, such as accurately describing the individuals and actions associated with the event. In this paper, we propose the rule-driven news captioning method, which can generate image descriptions following designated rule signal. Specifically, we first design the news-aware semantic rule for the descriptions. This rule incorporates the primary action depicted in the image (e.g., "performing") and the roles played by named entities involved in the action (e.g., "Agent" and "Place"). Second, we inject this semantic rule into the large-scale pre-trained model, BART, with the prefix-tuning strategy, where multiple encoder layers are embedded with news-aware semantic rule. Finally, we can effectively guide BART to generate news sentences that comply with the designated rule. Extensive experiments on two widely used datasets (i.e., GoodNews and NYTimes800k) demonstrate the effectiveness of our method.

Rule-driven News Captioning

TL;DR

This work tackles the challenge of generating news captions that comply with journalistic rules by introducing a rule-driven framework. It constructs a news-aware semantic rule that links the image’s primary action to affiliated named entities extracted from the article, and injects this rule into a large pre-trained model (BART) using prefix-tuning across the last three encoder layers. The method leverages NER, a fine-tuned CLIP for entity-visual alignment, and situation-recognition-based semantic roles to produce captions that accurately describe who did what, where, and with which entities. Empirical results on GoodNews and NYTimes800k show competitive CIDEr scores and improved named-entity recall, along with thorough ablations and qualitative analyses validating the rule-guided generation. The approach offers practical benefits for reliable, policy-compliant news captioning in applications like search indexing and AI-assisted newsrooms.

Abstract

News captioning task aims to generate sentences by describing named entities or concrete events for an image with its news article. Existing methods have achieved remarkable results by relying on the large-scale pre-trained models, which primarily focus on the correlations between the input news content and the output predictions. However, the news captioning requires adhering to some fundamental rules of news reporting, such as accurately describing the individuals and actions associated with the event. In this paper, we propose the rule-driven news captioning method, which can generate image descriptions following designated rule signal. Specifically, we first design the news-aware semantic rule for the descriptions. This rule incorporates the primary action depicted in the image (e.g., "performing") and the roles played by named entities involved in the action (e.g., "Agent" and "Place"). Second, we inject this semantic rule into the large-scale pre-trained model, BART, with the prefix-tuning strategy, where multiple encoder layers are embedded with news-aware semantic rule. Finally, we can effectively guide BART to generate news sentences that comply with the designated rule. Extensive experiments on two widely used datasets (i.e., GoodNews and NYTimes800k) demonstrate the effectiveness of our method.
Paper Structure (17 sections, 15 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 15 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison between existing news captioning method and our method. We design the news-aware semantic rule that includes the primary action depicted in the image as well as the roles played by the named entities involved in the action, which guides the model to generate the news sentence that adheres to the rules of news reporting.
  • Figure 2: Overview of the proposed method. (a) Named Entity Extraction. We use the Named Entity Recognition (NER) model to extract named entities and their categories from news articles. Then, the fine-tuned CLIP model is applied to calculate the image-text similarity score, filtering the named entities accordingly. (b) News Rule Construction. We design the news-aware semantic rule, which includes the primary action depicted in the image (e.g., "performing”) and the named entities involved in the action (e.g., "Ms. Micucci”, "Ms. Lindhome” and "theater”) along with their corresponding roles (e.g., "Agent” and "Stage”). (c) Caption Generation. We integrate the news-aware semantic rule, news article, and image, into BART, a large-scale pre-trained model, using the prefix-tuning strategy, for the news caption generation.
  • Figure 3: The detailed process of the caption generation. In order to ensure that BART effectively follows the specified rule signal, we integrate the news-aware semantic rule into the last three encoder layers of BART, using the prefix-tuning strategy.
  • Figure 4: The proportion of three types of named entities, i.e., PER (person's name), ORG (organization), and LOC (location), in the training corpus of GoodNews and NYTimes800k.
  • Figure 5: Qualitative results of our method. We provide the constructed news-aware semantic rule in each example. Non-Rule refers to the caption generated without using news-aware semantic rule. Tell uses the byte-pair-encoding transformer to generate captions TranMX20. GT is the ground-truth caption. Correct sentences are marked in green, while incorrect ones are marked in red.