Table of Contents
Fetching ...

Querying with Conflicts of Interest

Nischal Aryal, Arash Termehchy, Marianne Winslett

TL;DR

This paper proposes a novel formal framework for querying in settings where the data source has incentives to return biased answers intentionally due to the conflict of interest between the user and the data source, and proposes efficient algorithms to detect whether it is possible for users to extract relevant information from biased data sources.

Abstract

Conflicts of interest often arise between data sources and their users regarding how the users' information needs should be interpreted by the data source. For example, an online product search might be biased towards presenting certain products higher than in its list of results to improve its revenue, which may not follow the user's desired ranking expressed in their query. The research community has proposed schemes for data systems to implement to ensure unbiased results. However, data systems and services usually have little or no incentive to implement these measures, e.g., these biases often increase their profits. In this paper, we propose a novel formal framework for querying in settings where the data source has incentives to return biased answers intentionally due to the conflict of interest between the user and the data source. We propose efficient algorithms to detect whether it is possible for users to extract relevant information from biased data sources. We propose methods to detect biased information in the results of a query efficiently. We also propose algorithms to reformulate input queries to increase the amount of relevant information in the returned results over biased data sources. Using experiments on real-world datasets, we show that our algorithms are efficient and return relevant information over large data.

Querying with Conflicts of Interest

TL;DR

This paper proposes a novel formal framework for querying in settings where the data source has incentives to return biased answers intentionally due to the conflict of interest between the user and the data source, and proposes efficient algorithms to detect whether it is possible for users to extract relevant information from biased data sources.

Abstract

Conflicts of interest often arise between data sources and their users regarding how the users' information needs should be interpreted by the data source. For example, an online product search might be biased towards presenting certain products higher than in its list of results to improve its revenue, which may not follow the user's desired ranking expressed in their query. The research community has proposed schemes for data systems to implement to ensure unbiased results. However, data systems and services usually have little or no incentive to implement these measures, e.g., these biases often increase their profits. In this paper, we propose a novel formal framework for querying in settings where the data source has incentives to return biased answers intentionally due to the conflict of interest between the user and the data source. We propose efficient algorithms to detect whether it is possible for users to extract relevant information from biased data sources. We propose methods to detect biased information in the results of a query efficiently. We also propose algorithms to reformulate input queries to increase the amount of relevant information in the returned results over biased data sources. Using experiments on real-world datasets, we show that our algorithms are efficient and return relevant information over large data.
Paper Structure (60 sections, 16 theorems, 19 equations, 3 figures, 2 tables, 4 algorithms)

This paper contains 60 sections, 16 theorems, 19 equations, 3 figures, 2 tables, 4 algorithms.

Key Result

theorem 1

The interaction $(\mathcal{R}, \tau, U^r,U^s)$ is influential if and only if there are set-equivalent intents $\tau \neq \tau'$ and interpretations $\beta \neq \beta'$ where the following conditions hold for both user and data source utility functions $U^t \in \{U^r, U^s\}$: or

Figures (3)

  • Figure 1: Impact of number of attributes on execution time of Detecting Influential (DI) and Maximally Influential (MI)
  • Figure 2: Impact of $z$ on time to detect credible answers
  • Figure 3: Impact of bucketization on running time and user utility across Amazon, PriceRunner, Flights, and COMPAS.

Theorems & Definitions (26)

  • definition 1
  • theorem 1
  • definition 2
  • proposition 1
  • theorem 2
  • corollary 1
  • definition 3
  • definition 4
  • proposition 2
  • lemma 1
  • ...and 16 more