The Query/Hit Model for Sequential Hypothesis Testing
Mahshad Shariatnasab, Stefano Rini, Farhad Shirani, S. Sitharama Iyengar
TL;DR
The paper develops a sequential hypothesis-testing framework under a Query/Hit (Q/H) model, where a remote observer performs targeted queries to a source accessed indirectly through a responder. It defines adaptive and non-adaptive querying strategies, derives error-exponent performance bounds, and proposes the Dynamic Scout-Sentinel Algorithm (DSSA) that uses a mutual-information neural estimator to maximize information gain per query. The approach is evaluated on synthetic data and real human–bot interaction traces, showing improvements in detection speed and accuracy while respecting privacy constraints. This work advances efficient, privacy-aware sequential inference under restricted data access with practical implications for security, privacy-preserving analytics, and distributed decision-making.
Abstract
This work introduces the Query/Hit (Q/H) learning model. The setup consists of two agents. One agent, Alice, has access to a streaming source, while the other, Bob, does not have direct access to the source. Communication occurs through sequential Q/H pairs: Bob sends a sequence of source symbols (queries), and Alice responds with the waiting time until each query appears in the source stream (hits). This model is motivated by scenarios with communication, computation, and privacy constraints that limit real-time access to the source. The error exponent for sequential hypothesis testing under the Q/H model is characterized, and a querying strategy, the Dynamic Scout-Sentinel Algorithm (DSSA), is proposed. The strategy employs a mutual information neural estimator to compute the error exponent associated with each query and to select the query with the highest efficiency. Extensive empirical evaluations on both synthetic and real-world datasets -- including mouse movement trajectories, typesetting patterns, and touch-based user interactions -- are provided to evaluate the performance of the proposed strategy in comparison with baselines, in terms of probability of error, query choice, and time-to-detection.
