Table of Contents
Fetching ...

Sequential Manipulation Against Rank Aggregation: Theory and Algorithm

Ke Ma, Qianqian Xu, Jinshan Zeng, Wei Liu, Xiaochun Cao, Yingfei Sun, Qingming Huang

TL;DR

The paper studies sequential online manipulation of rank aggregation from pairwise comparisons, modeling the interaction between a malicious data source and a ranker as a distributionally robust game. It establishes the existence of distributionally robust Nash equilibria and shows that common sampling methods like Bernoulli and reservoir sampling can be vulnerable to adaptive adversaries. It then develops manipulation policies under complete knowledge with asymptotic optimality and robust estimators under incomplete knowledge via Wasserstein DRO, accompanied by efficient optimization techniques. Empirical results on simulated data, crowdsourcing, and election datasets demonstrate that the proposed sequential manipulation can steer rank-aggregation outcomes toward an attacker’s designated ranking. These findings highlight a significant security risk in online data collection for ranking systems and motivate defense-oriented research in robust data acquisition and detection of adversarial streams.

Abstract

Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner.

Sequential Manipulation Against Rank Aggregation: Theory and Algorithm

TL;DR

The paper studies sequential online manipulation of rank aggregation from pairwise comparisons, modeling the interaction between a malicious data source and a ranker as a distributionally robust game. It establishes the existence of distributionally robust Nash equilibria and shows that common sampling methods like Bernoulli and reservoir sampling can be vulnerable to adaptive adversaries. It then develops manipulation policies under complete knowledge with asymptotic optimality and robust estimators under incomplete knowledge via Wasserstein DRO, accompanied by efficient optimization techniques. Empirical results on simulated data, crowdsourcing, and election datasets demonstrate that the proposed sequential manipulation can steer rank-aggregation outcomes toward an attacker’s designated ranking. These findings highlight a significant security risk in online data collection for ranking systems and motivate defense-oriented research in robust data acquisition and detection of adversarial streams.

Abstract

Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner.
Paper Structure (19 sections, 22 theorems, 290 equations, 5 figures, 2 tables, 5 algorithms)

This paper contains 19 sections, 22 theorems, 290 equations, 5 figures, 2 tables, 5 algorithms.

Key Result

Theorem 1

There exists a DRNEeq:DRNE if the following states hold for any $r=1,\dots, R$.

Figures (5)

  • Figure 1: Overview of the offline and online adversarial settings. (a) In the offline confrontation scenario, the adversary observes the whole comparison graph on Oct. 17 and he/her obtains the attack strategy which needs to flip the comparisons which have occurred on Sep. 25 and Oct. 2. However, no one can return to the past and change what has happened. Moreover, bypassing the defense mechanisms of the rank aggregation to modify the completed comparison graph is really a challenging task. (b) Different form the offline attack methods, we consider the sequential manipulation strategies which has no knowledge of all future observed pairwise comparisons. The proposed online attack method inserts malicious into the data stream before the construction of comparison graph.
  • Figure 2: Comparative results of different sequential manipulation methods against HodgeRank and RankCentrality on simulated data. The box plot illustrates the results of $50$ trials with different data sequences which will make HodgeRank and RankCentrality generate $\boldsymbol{\pi}_0 =(10,9,8,7,6,5,4,3,2,1)$. The target list of the adversary is $\boldsymbol{\pi}'=(8,9,10,7,5,6,4,3,2,1)$. The proposed method provides a stable manipulation in the form of sequential action. All metrics of the proposed method will be $1$ with rare outliers. Meanwhile the three competitors fail to manipulate HodgeRank and RankCentrality with sequential actions. The 'Greedy' perturbation only focuses on the top-$1$ candidate but can't guarantee the designation of a winner. The result of 'Straightforward' strategy is inferior to the proposed method when the number of actions is the same.
  • Figure 3: Change of evaluation metrics on simulated data for different sequential manipulation methods against HodgeRank and RankCentrality. The horizontal axis lists the turns of game. When the interaction proceeds, the proposed method is able to generate malicious pairwise comparisons with incomplete knowledge and manipulate the victim, whose aggregated results are consistent with the attacker's target.
  • Figure 4: The victims' aggregation results of different manipulation methods on simulated data. The original ranking list is $\boldsymbol{\pi}_0=[10,9,8,7,6,5,4,3,2,1]$ and the target one is $\boldsymbol{\pi}'=[8,9,10,7,5,6,4,3,2,1]$ (refer to Target in the figure). The dark dots represent the candidates with large ID and vice versa. If a candidate is not in the same position in both ranking lists, there will exist an intersection and the width of the line represent the degree of inconsistent influence on the aggregation results. We mark the inconsistent dots with red circles. When there exist multiple intersections and the key locations (top-$3$) are marked by red circles, the attacker has failed to achieve his/her goal. (a) The victim is HodgeRank. The proposed method accomplishes the manipulation of the complete ranking list (no intersection or red circle). (b) The victim is RankCentrality.
  • Figure 5: Data distribution generated by different methods on simulated data. The vertical axis lists the number of rounds in the adversarial games and the horizontal axis displays all possible pairwise comparisons. For the same victim, all results are based on the same observed data. The original ranking list is $\boldsymbol{\pi}_0=[10,9,8,7,6,5,4,3,2,1]$ and the target one is $\boldsymbol{\pi}'=[8,9,10,7,5,6,4,3,2,1]$.

Theorems & Definitions (39)

  • Definition 1: Distributionally Robust Nash Equilibrium
  • Theorem 1
  • Definition 2: Static stream
  • Definition 3: Dynamic stream
  • Definition 4: $\epsilon$-approximation
  • Definition 5: $(\epsilon,\delta)$-representativeness
  • Theorem 2
  • Theorem 3
  • Definition 6
  • Proposition 1
  • ...and 29 more