Table of Contents
Fetching ...

Observation-Free Attacks on Online Learning to Rank

Sameep Chattopadhyay, Nikhil Karamchandani, Sharayu Moharir

TL;DR

The paper studies adversarial vulnerabilities of online learning to rank (OLTR) under observation-free attacks, where a coalition manipulates rewards without access to feedback. It introduces CascadeOFA for CascadeUCB1 and PBMOFA for PBM-UCB, a three-phase framework that achieves promotion of target items with only $O(\log T)$ reward manipulations while driving $\Omega(T)$ regret. The authors provide theoretical guarantees for both attacks and validate them with MovieLens experiments, showing effective top-K promotion of targeted items with manipulations below 3% of rounds. The work highlights potential risks in OLTR systems and motivates designing robust OLTR algorithms against observation-free coordination.

Abstract

Online learning to rank (OLTR) plays a critical role in information retrieval and machine learning systems, with a wide range of applications in search engines and content recommenders. However, despite their extensive adoption, the susceptibility of OLTR algorithms to coordinated adversarial attacks remains poorly understood. In this work, we present a novel framework for attacking some of the widely used OLTR algorithms. Our framework is designed to promote a set of target items so that they appear in the list of top-K recommendations for T - o(T) rounds, while simultaneously inducing linear regret in the learning algorithm. We propose two novel attack strategies: CascadeOFA for CascadeUCB1 and PBMOFA for PBM-UCB . We provide theoretical guarantees showing that both strategies require only O(log T) manipulations to succeed. Additionally, we supplement our theoretical analysis with empirical results on real-world data.

Observation-Free Attacks on Online Learning to Rank

TL;DR

The paper studies adversarial vulnerabilities of online learning to rank (OLTR) under observation-free attacks, where a coalition manipulates rewards without access to feedback. It introduces CascadeOFA for CascadeUCB1 and PBMOFA for PBM-UCB, a three-phase framework that achieves promotion of target items with only reward manipulations while driving regret. The authors provide theoretical guarantees for both attacks and validate them with MovieLens experiments, showing effective top-K promotion of targeted items with manipulations below 3% of rounds. The work highlights potential risks in OLTR systems and motivates designing robust OLTR algorithms against observation-free coordination.

Abstract

Online learning to rank (OLTR) plays a critical role in information retrieval and machine learning systems, with a wide range of applications in search engines and content recommenders. However, despite their extensive adoption, the susceptibility of OLTR algorithms to coordinated adversarial attacks remains poorly understood. In this work, we present a novel framework for attacking some of the widely used OLTR algorithms. Our framework is designed to promote a set of target items so that they appear in the list of top-K recommendations for T - o(T) rounds, while simultaneously inducing linear regret in the learning algorithm. We propose two novel attack strategies: CascadeOFA for CascadeUCB1 and PBMOFA for PBM-UCB . We provide theoretical guarantees showing that both strategies require only O(log T) manipulations to succeed. Additionally, we supplement our theoretical analysis with empirical results on real-world data.

Paper Structure

This paper contains 33 sections, 10 theorems, 32 equations, 5 figures, 3 tables, 4 algorithms.

Key Result

Theorem 3.1

If a collective adversary attacks $\text{CascadeUCB1}$ using the ${\tt CascadeOFA }$ strategy outlined in Algorithm alg:observation_free_cascade, then with $O(\log T)$ reward manipulation, it can ensure that each item $a \in \Tilde{\Gamma}$ is recommended for at least $T - O(\log T)$ rounds, with pr

Figures (5)

  • Figure 1: Comparison of the number of recommendations for target items with and without attack.
  • Figure 2: Comparison of regret for attack strategies under different click feedback models.
  • Figure 3: The number of recommendations for the target items with and without ${\tt CascadeOFA }$.
  • Figure : ${\tt CascadeOFA }$
  • Figure : ${\tt PBMOFA }$

Theorems & Definitions (22)

  • Theorem 3.1
  • Theorem 3.2
  • Remark 4.1
  • Remark A.1
  • Lemma B.1
  • Lemma B.2
  • Lemma B.3
  • proof : Proof of Theorem \ref{['thm:cascadeatk']}
  • Lemma B.4
  • Lemma B.5
  • ...and 12 more