NewsCaption: Named-Entity aware Captioning for Out-of-Context Media
Anurag Singh, Shivangi Aneja
TL;DR
This work tackles the problem of misinformation by enabling targeted out-of-context captions for images conditioned on textual context tokens. It introduces an end-to-end architecture that fuses named-entity recognition, a relational graph, and a Transformer-based captioning module, leveraging BPE to handle out-of-vocabulary tokens and multimodal features from CLIP and DETR. The approach shows that conditioning on textual input improves caption quality and controllability, achieving improvements over baselines on the COSMOS dataset and supported by qualitative analyses and human evaluation. The findings offer a practical benchmark and a plug-in component for strengthening out-of-context detection and misinformation defenses, while acknowledging ethical considerations and limitations in generalizability.
Abstract
With the increasing influence of social media, online misinformation has grown to become a societal issue. The motivation for our work comes from the threat caused by cheapfakes, where an unaltered image is described using a news caption in a new but false-context. The main challenge in detecting such out-of-context multimedia is the unavailability of large-scale datasets. Several detection methods employ randomly selected captions to generate out-of-context training inputs. However, these randomly matched captions are not truly representative of out-of-context scenarios due to inconsistencies between the image description and the matched caption. We aim to address these limitations by introducing a novel task of out-of-context caption generation. In this work, we propose a new method that generates a realistic out-of-context caption given visual and textual context. We also demonstrate that the semantics of the generated captions can be controlled using the textual context. We also evaluate our method against several baselines and our method improves over the image captioning baseline by 6.2% BLUE-4, 2.96% CiDEr, 11.5% ROUGE, and 7.3% METEOR
