MDiFF: Exploiting Multimodal Score-based Diffusion Models for New Fashion Product Performance Forecasting
Andrea Avogaro, Luigi Capogrosso, Franco Fummi, Marco Cristani
TL;DR
The paper addresses New Fashion Product Performance Forecasting (NFPPF) for unreleased items, where historical sales data are unavailable and domain shifts hinder traditional forecasts. It proposes MDiFF, a two-stage pipeline that first uses a multimodal score-based diffusion model conditioned on product imagery and release timing to generate multiple future sales signals, followed by a lightweight MLP that refines these signals into a final forecast. Evaluated on the VISUELLE dataset, MDiFF achieves state-of-the-art performance without relying on Google Trends or textual descriptions, demonstrating robustness to out-of-distribution items. The approach has practical implications for reducing overproduction and environmental impact in fast fashion by improving forecast accuracy for new products through diffusion-based uncertainty modeling and efficient refinement.
Abstract
The fast fashion industry suffers from significant environmental impacts due to overproduction and unsold inventory. Accurately predicting sales volumes for unreleased products could significantly improve efficiency and resource utilization. However, predicting performance for entirely new items is challenging due to the lack of historical data and rapidly changing trends, and existing deterministic models often struggle with domain shifts when encountering items outside the training data distribution. The recently proposed diffusion models address this issue using a continuous-time diffusion process. This allows us to simulate how new items are adopted, reducing the impact of domain shift challenges faced by deterministic models. As a result, in this paper, we propose MDiFF: a novel two-step multimodal diffusion models-based pipeline for New Fashion Product Performance Forecasting (NFPPF). First, we use a score-based diffusion model to predict multiple future sales for different clothes over time. Then, we refine these multiple predictions with a lightweight Multi-layer Perceptron (MLP) to get the final forecast. MDiFF leverages the strengths of both architectures, resulting in the most accurate and efficient forecasting system for the fast-fashion industry at the state-of-the-art. The code can be found at https://github.com/intelligolabs/MDiFF.
