AdParaphrase v2.0: Generating Attractive Ad Texts Using a Preference-Annotated Paraphrase Dataset
Soichiro Murakami, Peinan Zhang, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
TL;DR
AdParaphrase v2.0 addresses the need to understand what makes ad texts attractive by providing a large, preference-annotated paraphrase dataset for ads, expanding to 16,460 paraphrase pairs with 10 evaluators each. The authors combine LLM- and crowdworker-generated paraphrases, automated and manual paraphrase identification, and ten-point attractiveness judgments to uncover linguistic features that drive user engagement, identifying factors not seen in the prior v1.0 dataset. They further explore attractive ad text generation via instruction tuning and preference tuning (DPO), showing that findings from linguistic analysis can boost attractiveness, and demonstrate a strong link between human preferences and ad performance, including a high correlation with predicted CTR ($r=0.946$). The study also demonstrates the viability of reference-free metrics for assessing attractiveness and provides insights for practical ATG deployment, including online A/B testing results and considerations for generalizability and multilingual extension. Overall, AdParaphrase v2.0 enables more reliable analysis and modeling of ad text attractiveness, with direct implications for improved ATG methods and more effective advertising campaigns.
Abstract
Identifying factors that make ad text attractive is essential for advertising success. This study proposes AdParaphrase v2.0, a dataset for ad text paraphrasing, containing human preference data, to enable the analysis of the linguistic factors and to support the development of methods for generating attractive ad texts. Compared with v1.0, this dataset is 20 times larger, comprising 16,460 ad text paraphrase pairs, each annotated with preference data from ten evaluators, thereby enabling a more comprehensive and reliable analysis. Through the experiments, we identified multiple linguistic features of engaging ad texts that were not observed in v1.0 and explored various methods for generating attractive ad texts. Furthermore, our analysis demonstrated the relationships between human preference and ad performance, and highlighted the potential of reference-free metrics based on large language models for evaluating ad text attractiveness. The dataset is publicly available at: https://github.com/CyberAgentAILab/AdParaphrase-v2.0.
