Table of Contents
Fetching ...

POSIT: Promotion of Semantic Item Tail via Adversarial Learning

Qiuling Xu, Pannaga Shivaswamy, Xiangyu Zhang

TL;DR

The paper tackles popularity bias in recommender systems by promoting semantically meaningful long-tail items using POSIT, an adversarial learning framework with a small Lipschitz-constrained adversary. It converts user-level Recall@k into item-level advantages via Item Recall@k and guides the adversary to assign smooth, semantically coherent weights that amplify disadvantaged item groups. Integrating these weights into a base recommender (EASE) yields improved item coverage and often better ranking metrics, while maintaining tail performance as evidenced by increases in Item Recall@k and lower Gini indices across MovieLens, Netflix Prize, and Million Song. The approach demonstrates that targeted, semantically aware tail promotion can enhance diversity without sacrificing utility, offering a practical method for long-tail coverage in large catalogs.

Abstract

In many recommendations, a handful of popular items (e.g., movies / television shows, news, etc.) can be dominant in recommendations for many users. However, we know that in a large catalog of items, users are likely interested in more than what is popular. The dominance of popular items may mean that users will not see items that they would probably enjoy. In this paper, we propose a technique to overcome this problem using adversarial machine learning. We define a metric to translate the user-level utility metric in terms of an advantage/disadvantage over items. We subsequently used that metric in an adversarial learning framework to systematically promote disadvantaged items. Distinctly, our method integrates a small-capacity model to produce semantically meaningful weights, leading to an algorithm that identifies and promotes a semantically similar item within the learning process. In the empirical study, we evaluated the proposed technique on three publicly available datasets and seven competitive baselines. The result shows that our proposed method not only improves the coverage, but also, surprisingly, improves the overall performance.

POSIT: Promotion of Semantic Item Tail via Adversarial Learning

TL;DR

The paper tackles popularity bias in recommender systems by promoting semantically meaningful long-tail items using POSIT, an adversarial learning framework with a small Lipschitz-constrained adversary. It converts user-level Recall@k into item-level advantages via Item Recall@k and guides the adversary to assign smooth, semantically coherent weights that amplify disadvantaged item groups. Integrating these weights into a base recommender (EASE) yields improved item coverage and often better ranking metrics, while maintaining tail performance as evidenced by increases in Item Recall@k and lower Gini indices across MovieLens, Netflix Prize, and Million Song. The approach demonstrates that targeted, semantically aware tail promotion can enhance diversity without sacrificing utility, offering a practical method for long-tail coverage in large catalogs.

Abstract

In many recommendations, a handful of popular items (e.g., movies / television shows, news, etc.) can be dominant in recommendations for many users. However, we know that in a large catalog of items, users are likely interested in more than what is popular. The dominance of popular items may mean that users will not see items that they would probably enjoy. In this paper, we propose a technique to overcome this problem using adversarial machine learning. We define a metric to translate the user-level utility metric in terms of an advantage/disadvantage over items. We subsequently used that metric in an adversarial learning framework to systematically promote disadvantaged items. Distinctly, our method integrates a small-capacity model to produce semantically meaningful weights, leading to an algorithm that identifies and promotes a semantically similar item within the learning process. In the empirical study, we evaluated the proposed technique on three publicly available datasets and seven competitive baselines. The result shows that our proposed method not only improves the coverage, but also, surprisingly, improves the overall performance.
Paper Structure (16 sections, 15 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 16 sections, 15 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Figure (A) presents the adversarial learning workflow. In step ①, item advantage is quantified using an itemwise metric adapted from Recall (Eq. \ref{['eq:advantage_score_with_popularity']}). Items are visually distinguished as blue (disadvantaged) and orange (advantaged) for illustrative clarity. In steps ② and ③, an adversarial model, constrained by a small Lipschitz constant, assigns a continuum of weights to items. Driven by adversarial optimization, large weights are assigned to disadvantaged items, while small weights are assigned to advantaged ones. As this model produces smooth weight landscapes, to maximize the loss, these assigned weights naturally focus on the disadvantaged clusters while filtering out outliers. This process leads to the formation of semantic tails, a term that refers to clusters of disadvantaged but semantically related items. Step ④ involves adjusting the weights of these items and iterating the process. As weights are reassigned, the semantic tails dynamically change. We repeat these steps and continuously track the semantic tails via the adversarial model. Figure (B) presents a visualization of semantic tails on MovieLens, based on Principal Component Analysis (PCA). Each dot represents an item, with the color indicating its associated weight. Rare items appear near the origin in this representation. Observe semantic tails proximal to the center, characterized by a gradual decrease in weights extending outward. More comparison and details can be found in \ref{['sec:visualization']}.
  • Figure 2: Illustration of Outlier Filtering during Re-weighting on Movie-Lens Dataset. This diagram reveals the relationship between the weights proposed from the adversarial model and the corresponding advantage scores, with each dot representing a specific movie. The X-axis details the advantage score, while the Y-axis outlines the weight. For a better explanation, we group movies into three groups. It is worth noting that these boundaries are drawn manually and are only for illustration purposes. The blue and orange groups symbolize disadvantaged and advantaged movies, respectively, which are aptly promoted or demoted according to their advantage score. Conversely, the green group highlights potential outliers that, despite being disadvantaged, are not adequately promoted due to the semantic constraints of the adversary, thereby preventing a shift in focus to dissimilar items.
  • Figure 3: In this figure, we report the Item Recall@100 for movies of different categories. We compare different methods in MovieLens. Performance is averaged for each category and sorted from the worst category to the best. A low performance on specific categories, such as movies before 1910s and after 2010s, is due to limited data points in the dataset.
  • Figure 4: Comparison of Model Architecture. "Norm" indicates an $\ell_2$ standardization applied before nonlinearity activation. "MLP" indicates multi-layer perceptron. We use the tan-hyperbolic activation for intermediate layers and Sigmoid for the final output.
  • Figure 5: Comparison of Model Capacity. The label shows the number of hidden units on a 2-layer fully connected neural network for the adversary.
  • ...and 4 more figures