RecSys Challenge 2023: From data preparation to prediction, a simple, efficient, robust and scalable solution
Maxime Manderlier, Fabian Lecron
TL;DR
The paper tackles predicting app installs from ad impressions in ShareChat and Moj, using a dataset with categorical, binary, and numerical features. It proposes a compact neural network with embedding-based handling of categorical data, regression-based missing-value imputation, and min-max normalization, exploring both single-output and two-output architectures. The approach achieves a best challenge Log-Loss of 6.622686 and demonstrates that a small, robust model scales with data while remaining production-friendly. It also analyzes the trade-offs between single- and multi-output configurations and emphasizes practical deployment considerations for online advertising contexts.
Abstract
The RecSys Challenge 2023, presented by ShareChat, consists to predict if an user will install an application on his smartphone after having seen advertising impressions in ShareChat & Moj apps. This paper presents the solution of 'Team UMONS' to this challenge, giving accurate results (our best score is 6.622686) with a relatively small model that can be easily implemented in different production configurations. Our solution scales well when increasing the dataset size and can be used with datasets containing missing values.
