Table of Contents
Fetching ...

Well Googled is Half Done: Multimodal Forecasting of New Fashion Product Sales with Image-based Google Trends

Geri Skenderi, Christian Joppi, Matteo Denitto, Marco Cristani

TL;DR

This paper investigates the effectiveness of systematically probing exogenous knowledge in the form of Google Trends time series and combining it with multi‐modal information related to a brand‐new fashion item, in order to effectively forecast its sales despite the lack of past data.

Abstract

New fashion product sales forecasting is a challenging problem that involves many business dynamics and cannot be solved by classical forecasting approaches. In this paper, we investigate the effectiveness of systematically probing exogenous knowledge in the form of Google Trends time series and combining it with multi-modal information related to a brand-new fashion item, in order to effectively forecast its sales despite the lack of past data. In particular, we propose a neural network-based approach, where an encoder learns a representation of the exogenous time series, while the decoder forecasts the sales based on the Google Trends encoding and the available visual and metadata information. Our model works in a non-autoregressive manner, avoiding the compounding effect of large first-step errors. As a second contribution, we present VISUELLE, a publicly available dataset for the task of new fashion product sales forecasting, containing multimodal information for 5577 real, new products sold between 2016-2019 from Nunalie, an Italian fast-fashion company. The dataset is equipped with images of products, metadata, related sales, and associated Google Trends. We use VISUELLE to compare our approach against state-of-the-art alternatives and several baselines, showing that our neural network-based approach is the most accurate in terms of both percentage and absolute error. It is worth noting that the addition of exogenous knowledge boosts the forecasting accuracy by 1.5% in terms of Weighted Absolute Percentage Error (WAPE), revealing the importance of exploiting informative external information. The code and dataset are both available at https://github.com/HumaticsLAB/GTM-Transformer.

Well Googled is Half Done: Multimodal Forecasting of New Fashion Product Sales with Image-based Google Trends

TL;DR

This paper investigates the effectiveness of systematically probing exogenous knowledge in the form of Google Trends time series and combining it with multi‐modal information related to a brand‐new fashion item, in order to effectively forecast its sales despite the lack of past data.

Abstract

New fashion product sales forecasting is a challenging problem that involves many business dynamics and cannot be solved by classical forecasting approaches. In this paper, we investigate the effectiveness of systematically probing exogenous knowledge in the form of Google Trends time series and combining it with multi-modal information related to a brand-new fashion item, in order to effectively forecast its sales despite the lack of past data. In particular, we propose a neural network-based approach, where an encoder learns a representation of the exogenous time series, while the decoder forecasts the sales based on the Google Trends encoding and the available visual and metadata information. Our model works in a non-autoregressive manner, avoiding the compounding effect of large first-step errors. As a second contribution, we present VISUELLE, a publicly available dataset for the task of new fashion product sales forecasting, containing multimodal information for 5577 real, new products sold between 2016-2019 from Nunalie, an Italian fast-fashion company. The dataset is equipped with images of products, metadata, related sales, and associated Google Trends. We use VISUELLE to compare our approach against state-of-the-art alternatives and several baselines, showing that our neural network-based approach is the most accurate in terms of both percentage and absolute error. It is worth noting that the addition of exogenous knowledge boosts the forecasting accuracy by 1.5% in terms of Weighted Absolute Percentage Error (WAPE), revealing the importance of exploiting informative external information. The code and dataset are both available at https://github.com/HumaticsLAB/GTM-Transformer.

Paper Structure

This paper contains 26 sections, 9 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Sample images representing various product categories within the VISUELLE dataset.
  • Figure 2: Cardinalities of the dataset for clothing categories (a), color (b) and fabric (c).
  • Figure 3: 25th-percentile density plots of the SS18 and SS19 seasons.
  • Figure 4: Examples of Google Trends time-series spanning multiple years.
  • Figure 5: GTM-Transformer architecture. The encoder processes the exogenous Google Trends series and learns a representative embedding thanks to the self-attention mechanism. The decoder takes as input a multimodal embedding created from the Feature Fusion Network and then relies on a cross-attention mechanism to understand the implications of the Google Trend series on the multimodal embedding for the forecasting task. The output of the transformer model is then passed through a fully connected layer, to generate the sales forecast.
  • ...and 3 more figures