Table of Contents
Fetching ...

You Are What You Bought: Generating Customer Personas for E-commerce Applications

Yimin Shi, Yang Fei, Shiqi Zhang, Haixun Wang, Xiaokui Xiao

TL;DR

The paper tackles the opacity and integration challenges of implicit customer representations in e-commerce by introducing explicit customer personas and a scalable pipeline GPLR to generate them. GPLR uses LLMs to label a small prototype user set and then leverages random-walk-based affinity inference to estimate persona memberships for the remaining users, significantly reducing LLM usage. It further introduces RevAff to efficiently approximate user-persona affinities on large graphs, enabling rapid computation on datasets with hundreds of thousands to millions of nodes. By incorporating persona nodes into a tripartite graph, the authors demonstrate persona-enhanced graph convolutional recommendations that outperform state-of-the-art baselines by up to around 12% in key metrics and show robust improvements in customer segmentation. The work provides a practical framework for integrating human-readable knowledge and external reasoning into recommender systems with strong empirical validation on multiple real-world datasets.

Abstract

In e-commerce, user representations are essential for various applications. Existing methods often use deep learning techniques to convert customer behaviors into implicit embeddings. However, these embeddings are difficult to understand and integrate with external knowledge, limiting the effectiveness of applications such as customer segmentation, search navigation, and product recommendations. To address this, our paper introduces the concept of the customer persona. Condensed from a customer's numerous purchasing histories, a customer persona provides a multi-faceted and human-readable characterization of specific purchase behaviors and preferences, such as Busy Parents or Bargain Hunters. This work then focuses on representing each customer by multiple personas from a predefined set, achieving readable and informative explicit user representations. To this end, we propose an effective and efficient solution GPLR. To ensure effectiveness, GPLR leverages pre-trained LLMs to infer personas for customers. To reduce overhead, GPLR applies LLM-based labeling to only a fraction of users and utilizes a random walk technique to predict personas for the remaining customers. We further propose RevAff, which provides an absolute error $ε$ guarantee while improving the time complexity of the exact solution by a factor of at least $O(\frac{ε\cdot|E|N}{|E|+N\log N})$, where $N$ represents the number of customers and products, and $E$ represents the interactions between them. We evaluate the performance of our persona-based representation in terms of accuracy and robustness for recommendation and customer segmentation tasks using three real-world e-commerce datasets. Most notably, we find that integrating customer persona representations improves the state-of-the-art graph convolution-based recommendation model by up to 12% in terms of NDCG@K and F1-Score@K.

You Are What You Bought: Generating Customer Personas for E-commerce Applications

TL;DR

The paper tackles the opacity and integration challenges of implicit customer representations in e-commerce by introducing explicit customer personas and a scalable pipeline GPLR to generate them. GPLR uses LLMs to label a small prototype user set and then leverages random-walk-based affinity inference to estimate persona memberships for the remaining users, significantly reducing LLM usage. It further introduces RevAff to efficiently approximate user-persona affinities on large graphs, enabling rapid computation on datasets with hundreds of thousands to millions of nodes. By incorporating persona nodes into a tripartite graph, the authors demonstrate persona-enhanced graph convolutional recommendations that outperform state-of-the-art baselines by up to around 12% in key metrics and show robust improvements in customer segmentation. The work provides a practical framework for integrating human-readable knowledge and external reasoning into recommender systems with strong empirical validation on multiple real-world datasets.

Abstract

In e-commerce, user representations are essential for various applications. Existing methods often use deep learning techniques to convert customer behaviors into implicit embeddings. However, these embeddings are difficult to understand and integrate with external knowledge, limiting the effectiveness of applications such as customer segmentation, search navigation, and product recommendations. To address this, our paper introduces the concept of the customer persona. Condensed from a customer's numerous purchasing histories, a customer persona provides a multi-faceted and human-readable characterization of specific purchase behaviors and preferences, such as Busy Parents or Bargain Hunters. This work then focuses on representing each customer by multiple personas from a predefined set, achieving readable and informative explicit user representations. To this end, we propose an effective and efficient solution GPLR. To ensure effectiveness, GPLR leverages pre-trained LLMs to infer personas for customers. To reduce overhead, GPLR applies LLM-based labeling to only a fraction of users and utilizes a random walk technique to predict personas for the remaining customers. We further propose RevAff, which provides an absolute error guarantee while improving the time complexity of the exact solution by a factor of at least , where represents the number of customers and products, and represents the interactions between them. We evaluate the performance of our persona-based representation in terms of accuracy and robustness for recommendation and customer segmentation tasks using three real-world e-commerce datasets. Most notably, we find that integrating customer persona representations improves the state-of-the-art graph convolution-based recommendation model by up to 12% in terms of NDCG@K and F1-Score@K.

Paper Structure

This paper contains 27 sections, 6 theorems, 7 equations, 7 figures, 10 tables, 2 algorithms.

Key Result

Theorem 1

The time complexity for computing the exact $\mathbf{\Psi}$ by Eq. eq:aff is $O(|E|\cdot|U\xspace| + (\hat{\ell}\xspace-1)|U\xspace|^3 + |R\xspace|\cdot|U\xspace|^2)$.

Figures (7)

  • Figure 1: LGCN3 with different sample rates on OnlineRetail.
  • Figure 2.a: Case study on initial persona set generation (Take MBA as an example) - Step 1.
  • Figure 2.b: Case study on initial persona set generation (Take MBA as an example) - Step 2.
  • Figure 2.c: Case study on initial persona set generation (Take MBA as an example) - Step 3.
  • Figure 3.a: Case study on user persona generation (take user 12358 in MBA as an example) - Instruction.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Theorem 1
  • definition 1: $\epsilon\xspace$-approximate user-persona affinity
  • Theorem 2
  • Theorem 3
  • Theorem 1
  • Theorem 3
  • Theorem 4