Table of Contents
Fetching ...

Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition

Bingbing Wang, Bin Liang, Chun-Mei Feng, Wangmeng Zuo, Zhixin Bai, Shijue Huang, Kam-Fai Wong, Xi Zeng, Ruifeng Xu

TL;DR

StickerTAG is introduced, the first multi-tag sticker dataset comprising a collected tag set with 461 tags and 13,571 sticker-tag pairs, designed to provide a deeper understanding of stickers.

Abstract

In real-world conversations, the diversity and ambiguity of stickers often lead to varied interpretations based on the context, necessitating the requirement for comprehensively understanding stickers and supporting multi-tagging. To address this challenge, we introduce StickerTAG, the first multi-tag sticker dataset comprising a collected tag set with 461 tags and 13,571 sticker-tag pairs, designed to provide a deeper understanding of stickers. Recognizing multiple tags for stickers becomes particularly challenging due to sticker tags usually are fine-grained attribute aware. Hence, we propose an Attentive Attribute-oriented Prompt Learning method, ie, Att$^2$PL, to capture informative features of stickers in a fine-grained manner to better differentiate tags. Specifically, we first apply an Attribute-oriented Description Generation (ADG) module to obtain the description for stickers from four attributes. Then, a Local Re-attention (LoR) module is designed to perceive the importance of local information. Finally, we use prompt learning to guide the recognition process and adopt confidence penalty optimization to penalize the confident output distribution. Extensive experiments show that our method achieves encouraging results for all commonly used metrics.

Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition

TL;DR

StickerTAG is introduced, the first multi-tag sticker dataset comprising a collected tag set with 461 tags and 13,571 sticker-tag pairs, designed to provide a deeper understanding of stickers.

Abstract

In real-world conversations, the diversity and ambiguity of stickers often lead to varied interpretations based on the context, necessitating the requirement for comprehensively understanding stickers and supporting multi-tagging. To address this challenge, we introduce StickerTAG, the first multi-tag sticker dataset comprising a collected tag set with 461 tags and 13,571 sticker-tag pairs, designed to provide a deeper understanding of stickers. Recognizing multiple tags for stickers becomes particularly challenging due to sticker tags usually are fine-grained attribute aware. Hence, we propose an Attentive Attribute-oriented Prompt Learning method, ie, AttPL, to capture informative features of stickers in a fine-grained manner to better differentiate tags. Specifically, we first apply an Attribute-oriented Description Generation (ADG) module to obtain the description for stickers from four attributes. Then, a Local Re-attention (LoR) module is designed to perceive the importance of local information. Finally, we use prompt learning to guide the recognition process and adopt confidence penalty optimization to penalize the confident output distribution. Extensive experiments show that our method achieves encouraging results for all commonly used metrics.
Paper Structure (22 sections, 4 equations, 5 figures, 3 tables)

This paper contains 22 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Examples of stickers along with multiple tags.
  • Figure 2: (a) Word cloud distribution of the sticker tags. Larger text size indicates a higher frequency of occurrence. (b) Number of samples per tag, highlighted by an orange trend line.
  • Figure 3: Illustration of the proposed Att$^2$PL method comprising (1) Attribute-oriented Description Generation, (2) Local Re-attention Module, (3) Prompt-based Classification, and (4) Confidence Penalty Optimization (blue lines).
  • Figure 4: Overview of attribute-oriented description generation.
  • Figure 5: Examples of stickers with ground truth tags and the predicted tags inferred by our Att$^2$PL framework.