Table of Contents
Fetching ...

Multi-task Prompt Words Learning for Social Media Content Generation

Haochen Xue, Chong Zhang, Chengzhi Liu, Fangyu Wu, Xiaobo Jin

TL;DR

The paper tackles the challenge of controllable social media content generation by introducing Multi-task Prompt Word Learning (MPWL), a framework that fuses multimodal image and text features to produce topic, sentiment, scene, and keyword prompts for guiding GPT-4-based tweet generation. It combines crude image description with a structured MPWL pipeline and a template-driven prompt to achieve high-quality, coherent tweets aligned with visuals, aided by GroundingDINO-based image cropping. The authors validate MPWL through comparative, ablation, and generalizability experiments, showing superior performance over manual prompts and other prompting methods in most metrics, and demonstrate adaptability to sentiment and scene tasks across multiple datasets. Overall, the work provides a practical, scalable approach to augmenting AI-generated social media content with precise, task-specific prompts and objective evaluation via GPT-based scoring, potentially enhancing automation and quality in real-world social media workflows.

Abstract

The rapid development of the Internet has profoundly changed human life. Humans are increasingly expressing themselves and interacting with others on social media platforms. However, although artificial intelligence technology has been widely used in many aspects of life, its application in social media content creation is still blank. To solve this problem, we propose a new prompt word generation framework based on multi-modal information fusion, which combines multiple tasks including topic classification, sentiment analysis, scene recognition and keyword extraction to generate more comprehensive prompt words. Subsequently, we use a template containing a set of prompt words to guide ChatGPT to generate high-quality tweets. Furthermore, in the absence of effective and objective evaluation criteria in the field of content generation, we use the ChatGPT tool to evaluate the results generated by the algorithm, making large-scale evaluation of content generation algorithms possible. Evaluation results on extensive content generation demonstrate that our cue word generation framework generates higher quality content compared to manual methods and other cueing techniques, while topic classification, sentiment analysis, and scene recognition significantly enhance content clarity and its consistency with the image.

Multi-task Prompt Words Learning for Social Media Content Generation

TL;DR

The paper tackles the challenge of controllable social media content generation by introducing Multi-task Prompt Word Learning (MPWL), a framework that fuses multimodal image and text features to produce topic, sentiment, scene, and keyword prompts for guiding GPT-4-based tweet generation. It combines crude image description with a structured MPWL pipeline and a template-driven prompt to achieve high-quality, coherent tweets aligned with visuals, aided by GroundingDINO-based image cropping. The authors validate MPWL through comparative, ablation, and generalizability experiments, showing superior performance over manual prompts and other prompting methods in most metrics, and demonstrate adaptability to sentiment and scene tasks across multiple datasets. Overall, the work provides a practical, scalable approach to augmenting AI-generated social media content with precise, task-specific prompts and objective evaluation via GPT-based scoring, potentially enhancing automation and quality in real-world social media workflows.

Abstract

The rapid development of the Internet has profoundly changed human life. Humans are increasingly expressing themselves and interacting with others on social media platforms. However, although artificial intelligence technology has been widely used in many aspects of life, its application in social media content creation is still blank. To solve this problem, we propose a new prompt word generation framework based on multi-modal information fusion, which combines multiple tasks including topic classification, sentiment analysis, scene recognition and keyword extraction to generate more comprehensive prompt words. Subsequently, we use a template containing a set of prompt words to guide ChatGPT to generate high-quality tweets. Furthermore, in the absence of effective and objective evaluation criteria in the field of content generation, we use the ChatGPT tool to evaluate the results generated by the algorithm, making large-scale evaluation of content generation algorithms possible. Evaluation results on extensive content generation demonstrate that our cue word generation framework generates higher quality content compared to manual methods and other cueing techniques, while topic classification, sentiment analysis, and scene recognition significantly enhance content clarity and its consistency with the image.
Paper Structure (20 sections, 3 equations, 5 figures, 4 tables)

This paper contains 20 sections, 3 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The pipeline of our algorithm is as follows: 1) Image features and text features are fused after BERT and VIT; 2) Based on the fused image features and text features, we use multi-task learning to generate prompt words; 3) Twitter text is generated based on ChatGPT and a template filled with prompt words; 4) The synthesized square image and Twitter text are combined into the final tweet containing images and text.
  • Figure 2: Prompt word template for multi-prompt word learning
  • Figure 3: Network architecture for topic analysis task
  • Figure 4: Network architecture of scene recognition task
  • Figure 5: The Designed Template for Scoring Generated Tweets