Table of Contents
Fetching ...

The Query/Hit Model for Sequential Hypothesis Testing

Mahshad Shariatnasab, Stefano Rini, Farhad Shirani, S. Sitharama Iyengar

TL;DR

The paper develops a sequential hypothesis-testing framework under a Query/Hit (Q/H) model, where a remote observer performs targeted queries to a source accessed indirectly through a responder. It defines adaptive and non-adaptive querying strategies, derives error-exponent performance bounds, and proposes the Dynamic Scout-Sentinel Algorithm (DSSA) that uses a mutual-information neural estimator to maximize information gain per query. The approach is evaluated on synthetic data and real human–bot interaction traces, showing improvements in detection speed and accuracy while respecting privacy constraints. This work advances efficient, privacy-aware sequential inference under restricted data access with practical implications for security, privacy-preserving analytics, and distributed decision-making.

Abstract

This work introduces the Query/Hit (Q/H) learning model. The setup consists of two agents. One agent, Alice, has access to a streaming source, while the other, Bob, does not have direct access to the source. Communication occurs through sequential Q/H pairs: Bob sends a sequence of source symbols (queries), and Alice responds with the waiting time until each query appears in the source stream (hits). This model is motivated by scenarios with communication, computation, and privacy constraints that limit real-time access to the source. The error exponent for sequential hypothesis testing under the Q/H model is characterized, and a querying strategy, the Dynamic Scout-Sentinel Algorithm (DSSA), is proposed. The strategy employs a mutual information neural estimator to compute the error exponent associated with each query and to select the query with the highest efficiency. Extensive empirical evaluations on both synthetic and real-world datasets -- including mouse movement trajectories, typesetting patterns, and touch-based user interactions -- are provided to evaluate the performance of the proposed strategy in comparison with baselines, in terms of probability of error, query choice, and time-to-detection.

The Query/Hit Model for Sequential Hypothesis Testing

TL;DR

The paper develops a sequential hypothesis-testing framework under a Query/Hit (Q/H) model, where a remote observer performs targeted queries to a source accessed indirectly through a responder. It defines adaptive and non-adaptive querying strategies, derives error-exponent performance bounds, and proposes the Dynamic Scout-Sentinel Algorithm (DSSA) that uses a mutual-information neural estimator to maximize information gain per query. The approach is evaluated on synthetic data and real human–bot interaction traces, showing improvements in detection speed and accuracy while respecting privacy constraints. This work advances efficient, privacy-aware sequential inference under restricted data access with practical implications for security, privacy-preserving analytics, and distributed decision-making.

Abstract

This work introduces the Query/Hit (Q/H) learning model. The setup consists of two agents. One agent, Alice, has access to a streaming source, while the other, Bob, does not have direct access to the source. Communication occurs through sequential Q/H pairs: Bob sends a sequence of source symbols (queries), and Alice responds with the waiting time until each query appears in the source stream (hits). This model is motivated by scenarios with communication, computation, and privacy constraints that limit real-time access to the source. The error exponent for sequential hypothesis testing under the Q/H model is characterized, and a querying strategy, the Dynamic Scout-Sentinel Algorithm (DSSA), is proposed. The strategy employs a mutual information neural estimator to compute the error exponent associated with each query and to select the query with the highest efficiency. Extensive empirical evaluations on both synthetic and real-world datasets -- including mouse movement trajectories, typesetting patterns, and touch-based user interactions -- are provided to evaluate the performance of the proposed strategy in comparison with baselines, in terms of probability of error, query choice, and time-to-detection.

Paper Structure

This paper contains 15 sections, 4 theorems, 53 equations, 5 figures.

Key Result

Lemma 1

Let $T_1, T_2, \ldots, T_k$ be the observed responses from a sequence of queries based on the data $Z_1, Z_2, \ldots, Z_n$, where $Z_i$ are i.i.d. according to a probability distribution $Q$. Consider the decision problem corresponding to the hypotheses $Q = P_X$ (for the underlying data $Z$) versus Let $\alpha^* = \mathbb{P}^k(A_k(\lambda)^c | Q = P_X)$ and $\beta^* = \mathbb{P}^k(A_k(\lambda) |

Figures (5)

  • Figure 1: This will change
  • Figure 2: Average Time to Stop vs $p$ for different values of $m$
  • Figure 3: Adaptive scenario
  • Figure 4: Caption
  • Figure 5: Caption

Theorems & Definitions (17)

  • Definition 1: Query Function and Query Response
  • Remark 1
  • Definition 2: Identification Strategy
  • Remark 2
  • Definition 3: Type-I Error Probability
  • Definition 4: Type-II Error Probability
  • Definition 5: Objective Function
  • Remark 3
  • Definition 6: Non-Adaptive $Q$
  • Definition 7: Adaptive $Q$
  • ...and 7 more