Table of Contents
Fetching ...

Anime Popularity Prediction Before Huge Investments: a Multimodal Approach Using Deep Learning

Jesús Armenta-Segura, Grigori Sidorov

TL;DR

One of the most comprehensive free datasets for predicting anime popularity using only features accessible before huge investments using only features accessible before huge investments is introduced, relying solely on freely available internet data and adhering to rigorous standards based on real-life experiences.

Abstract

In the japanese anime industry, predicting whether an upcoming product will be popular is crucial. This paper presents a dataset and methods on predicting anime popularity using a multimodal textimage dataset constructed exclusively from freely available internet sources. The dataset was built following rigorous standards based on real-life investment experiences. A deep neural network architecture leveraging GPT-2 and ResNet-50 to embed the data was employed to investigate the correlation between the multimodal text-image input and a popularity score, discovering relevant strengths and weaknesses in the dataset. To measure the accuracy of the model, mean squared error (MSE) was used, obtaining a best result of 0.011 when considering all inputs and the full version of the deep neural network, compared to the benchmark MSE 0.412 obtained with traditional TF-IDF and PILtotensor vectorizations. This is the first proposal to address such task with multimodal datasets, revealing the substantial benefit of incorporating image information, even when a relatively small model (ResNet-50) was used to embed them.

Anime Popularity Prediction Before Huge Investments: a Multimodal Approach Using Deep Learning

TL;DR

One of the most comprehensive free datasets for predicting anime popularity using only features accessible before huge investments using only features accessible before huge investments is introduced, relying solely on freely available internet data and adhering to rigorous standards based on real-life experiences.

Abstract

In the japanese anime industry, predicting whether an upcoming product will be popular is crucial. This paper presents a dataset and methods on predicting anime popularity using a multimodal textimage dataset constructed exclusively from freely available internet sources. The dataset was built following rigorous standards based on real-life investment experiences. A deep neural network architecture leveraging GPT-2 and ResNet-50 to embed the data was employed to investigate the correlation between the multimodal text-image input and a popularity score, discovering relevant strengths and weaknesses in the dataset. To measure the accuracy of the model, mean squared error (MSE) was used, obtaining a best result of 0.011 when considering all inputs and the full version of the deep neural network, compared to the benchmark MSE 0.412 obtained with traditional TF-IDF and PILtotensor vectorizations. This is the first proposal to address such task with multimodal datasets, revealing the substantial benefit of incorporating image information, even when a relatively small model (ResNet-50) was used to embed them.

Paper Structure

This paper contains 15 sections, 5 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Visual representations of the anime character Faye Valentine, from Cowboy Bebop, during early stages of development. a) Her portrait in MyAnimeList. b) Her character sketch designs 3 with several poses, angles and facial emotions. More than this level of detail is required for further references for animators 24, although are not assessible during early stages of development.
  • Figure 2: Mean Synopsis Wordcount left) and Mean Charcter Frequency (right) across the dataset. In both Figures the X-axis represents the floor score. Y-axis is the mean words on synopsis per score and mean amount of main characters per score.
  • Figure 3: Mean Synopsis Wordcount in the training (left) and test (right) set. X-axis represents the floor score. Y-axis is the mean words on synopsis per score.
  • Figure 4: Mean Character Wordcount in the training (left) and test (right) set. X-axis represents the floor score. Y-axis is the mean characters per score.
  • Figure 5: Full Three-input Deep Neural Network
  • ...and 1 more figures