On Randomized Algorithms in Online Strategic Classification
Chase Hutton, Adam Melrod, Han Shao
TL;DR
The paper studies online strategic classification where agents manipulate features to influence outcomes. It analyzes randomized algorithms under realizable and agnostic settings, proving that randomization does not always improve realizable mistake bounds and providing refined upper and lower bounds; in the agnostic setting it designs a proper learner with a near-optimal $O(\sqrt{T \log |\mathcal{H}|} + |\mathcal{H}| \log(T|\mathcal{H}|))$ regret, while showing matching lower bounds up to log factors and implying improper learning is needed for further improvements. Key techniques include manipulation-graph modeling, Littlestone dimension analysis, and an FTRL-based approach with log-barrier regularization and shifted loss estimators. The results delineate when randomization helps in online strategic settings and establish fundamental limits for both finite and infinite hypothesis classes with practical implications for deploying strategic classifiers. Overall, the work clarifies the trade-offs between realizable and agnostic performance and between proper and improper learning in environments with strategic agents.
Abstract
Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan approval based on credit scores, applicants may open or close credit cards and bank accounts to obtain a positive prediction. The learning goal is to achieve low mistake or regret bounds despite such strategic behavior. While randomized algorithms have the potential to offer advantages to the learner in strategic settings, they have been largely underexplored. In the realizable setting, no lower bound is known for randomized algorithms, and existing lower bound constructions for deterministic learners can be circumvented by randomization. In the agnostic setting, the best known regret upper bound is $O(T^{3/4}\log^{1/4}T|\mathcal H|)$, which is far from the standard online learning rate of $O(\sqrt{T\log|\mathcal H|})$. In this work, we provide refined bounds for online strategic classification in both settings. In the realizable setting, we extend, for $T > \mathrm{Ldim}(\mathcal{H}) Δ^2$, the existing lower bound $Ω(\mathrm{Ldim}(\mathcal{H}) Δ)$ for deterministic learners to all learners. This yields the first lower bound that applies to randomized learners. We also provide the first randomized learner that improves the known (deterministic) upper bound of $O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$. In the agnostic setting, we give a proper learner using convex optimization techniques to improve the regret upper bound to $O(\sqrt{T \log |\mathcal{H}|} + |\mathcal{H}| \log(T|\mathcal{H}|))$. We show a matching lower bound up to logarithmic factors for all proper learning rules, demonstrating the optimality of our learner among proper learners. As such, improper learning is necessary to further improve regret guarantees.
