Table of Contents
Fetching ...

Multi-Margin Cosine Loss: Proposal and Application in Recommender Systems

Makbule Gulcin Ozsoy

TL;DR

The proposed Multi-Margin Cosine Loss (MMCL) addresses challenges of contrastive learning by introducing multiple margins and varying weights for negative samples, and efficiently utilizes not only the hardest negatives but also other non-trivial negatives.

Abstract

Recommender systems guide users through vast amounts of information by suggesting items based on their predicted preferences. Collaborative filtering-based deep learning techniques have regained popularity due to their straightforward nature, relying only on user-item interactions. Typically, these systems consist of three main components: an interaction module, a loss function, and a negative sampling strategy. Initially, researchers focused on enhancing performance by developing complex interaction modules. However, there has been a recent shift toward refining loss functions and negative sampling strategies. This shift has led to an increased interest in contrastive learning, which pulls similar pairs closer while pushing dissimilar ones apart. Contrastive learning may bring challenges like high memory demands and under-utilization of some negative samples. The proposed Multi-Margin Cosine Loss (MMCL) addresses these challenges by introducing multiple margins and varying weights for negative samples. It efficiently utilizes not only the hardest negatives but also other non-trivial negatives, offers a simpler yet effective loss function that outperforms more complex methods, especially when resources are limited. Experiments on two well-known datasets demonstrated that MMCL achieved up to a 20\% performance improvement compared to a baseline loss function when fewer number of negative samples are used.

Multi-Margin Cosine Loss: Proposal and Application in Recommender Systems

TL;DR

The proposed Multi-Margin Cosine Loss (MMCL) addresses challenges of contrastive learning by introducing multiple margins and varying weights for negative samples, and efficiently utilizes not only the hardest negatives but also other non-trivial negatives.

Abstract

Recommender systems guide users through vast amounts of information by suggesting items based on their predicted preferences. Collaborative filtering-based deep learning techniques have regained popularity due to their straightforward nature, relying only on user-item interactions. Typically, these systems consist of three main components: an interaction module, a loss function, and a negative sampling strategy. Initially, researchers focused on enhancing performance by developing complex interaction modules. However, there has been a recent shift toward refining loss functions and negative sampling strategies. This shift has led to an increased interest in contrastive learning, which pulls similar pairs closer while pushing dissimilar ones apart. Contrastive learning may bring challenges like high memory demands and under-utilization of some negative samples. The proposed Multi-Margin Cosine Loss (MMCL) addresses these challenges by introducing multiple margins and varying weights for negative samples. It efficiently utilizes not only the hardest negatives but also other non-trivial negatives, offers a simpler yet effective loss function that outperforms more complex methods, especially when resources are limited. Experiments on two well-known datasets demonstrated that MMCL achieved up to a 20\% performance improvement compared to a baseline loss function when fewer number of negative samples are used.
Paper Structure (13 sections, 7 equations, 3 figures, 4 tables)

This paper contains 13 sections, 7 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Three major components for a deep-learning based CF model: (i) interaction encoder, (ii) loss function and (iii) negative sampling. A user interacts with an item $p$ (e.g., purchases a product). From the remaining negative items $n$, four negatives, $n_1$-$n_4$, are sampled. The sampled negative items, positive item and user representations are fed into the interaction encoding and loss function modules for computation.
  • Figure 2: (a) Margin $m$ defines a radius, which makes the model to pay attention to harder-negatives. In Contrastive Loss, Eq.\ref{['eq:contrastiveLoss']}, the radius is relative to user representation ($m1$). In Triplet Loss, Eq. \ref{['eq:tripletLoss']} it is relative to positive item ($m2$). (b) Proposed Multi-Margin Cosine Loss (MMCL), Eq. \ref{['eq:multiMarginLoss']}, filters not only the hardest-negatives but also other non-trivial negatives, using multiple margins, $m1,m2,m3$, and assigns different weights to each level, $w1,w2,w3$.
  • Figure 3: Performance of MF model under recent loss functions. The proposed MMCL outperforms baseline loss functions when the number of negative samples is 100 or fewer, suggesting it is more efficient in resource-constrained environments.