Decoding Memes: A Comparative Study of Machine Learning Models for Template Identification
Levente Murgás, Marcell Nagy, Kate Barnes, Roland Molontay
TL;DR
This work tackles automatic meme template identification, a challenging problem due to templateless memes and evolving templates. It introduces a rigorous evaluation framework and compares a broad set of methods—ranging from baseline visual features and CNN-based embeddings to perceptual hashing and sparse/unsupervised approaches—on two large datasets (Imgflip and social media). Key findings show that a two-headed DenseNet excels in recognizing known templates on labeled data, while a radii-based nearest-neighbors approach with pHash embeddings offers robust, real-world performance across templated and templateless memes; perceptual hashing delivers high precision but low recall, and RNN-FM, though accurate, is computationally intensive. The results underscore the importance of handling templateless memes for scalable meme analysis and provide data/code to enable replication and further research.
Abstract
Image-with-text memes combine text with imagery to achieve comedy, but in today's world, they also play a pivotal role in online communication, influencing politics, marketing, and social norms. A "meme template" is a preexisting layout or format that is used to create memes. It typically includes specific visual elements, characters, or scenes with blank spaces or captions that can be customized, allowing users to easily create their versions of popular meme templates by adding personal or contextually relevant content. Despite extensive research on meme virality, the task of automatically identifying meme templates remains a challenge. This paper presents a comprehensive comparison and evaluation of existing meme template identification methods, including both established approaches from the literature and novel techniques. We introduce a rigorous evaluation framework that not only assesses the ability of various methods to correctly identify meme templates but also tests their capacity to reject non-memes without false assignments. Our study involves extensive data collection from sites that provide meme annotations (Imgflip) and various social media platforms (Reddit, X, and Facebook) to ensure a diverse and representative dataset. We compare meme template identification methods, highlighting their strengths and limitations. These include supervised and unsupervised approaches, such as convolutional neural networks, distance-based classification, and density-based clustering. Our analysis helps researchers and practitioners choose suitable methods and points to future research directions in this evolving field.
