Table of Contents
Fetching ...

Efficient Logistic Regression with Mixture of Sigmoids

Federico Di Gennaro, Saptarshi Chakraborty, Nikita Zhivotovskiy

Abstract

This paper studies the Exponential Weights (EW) algorithm with an isotropic Gaussian prior for online logistic regression. We show that the near-optimal worst-case regret bound $O(d\log(Bn))$ for EW, established by Kakade and Ng (2005) against the best linear predictor of norm at most $B$, can be achieved with total worst-case computational complexity $O(B^3 n^5)$. This substantially improves on the $O(B^{18}n^{37})$ complexity of prior work achieving the same guarantee (Foster et al., 2018). Beyond efficiency, we analyze the large-$B$ regime under linear separability: after rescaling by $B$, the EW posterior converges as $B\to\infty$ to a standard Gaussian truncated to the version cone. Accordingly, the predictor converges to a solid-angle vote over separating directions and, on every fixed-margin slice of this cone, the mode of the corresponding truncated Gaussian is aligned with the hard-margin SVM direction. Using this geometry, we derive non-asymptotic regret bounds showing that once $B$ exceeds a margin-dependent threshold, the regret becomes independent of $B$ and grows only logarithmically with the inverse margin. Overall, our results show that EW can be both computationally tractable and geometrically adaptive in online classification.

Efficient Logistic Regression with Mixture of Sigmoids

Abstract

This paper studies the Exponential Weights (EW) algorithm with an isotropic Gaussian prior for online logistic regression. We show that the near-optimal worst-case regret bound for EW, established by Kakade and Ng (2005) against the best linear predictor of norm at most , can be achieved with total worst-case computational complexity . This substantially improves on the complexity of prior work achieving the same guarantee (Foster et al., 2018). Beyond efficiency, we analyze the large- regime under linear separability: after rescaling by , the EW posterior converges as to a standard Gaussian truncated to the version cone. Accordingly, the predictor converges to a solid-angle vote over separating directions and, on every fixed-margin slice of this cone, the mode of the corresponding truncated Gaussian is aligned with the hard-margin SVM direction. Using this geometry, we derive non-asymptotic regret bounds showing that once exceeds a margin-dependent threshold, the regret becomes independent of and grows only logarithmically with the inverse margin. Overall, our results show that EW can be both computationally tractable and geometrically adaptive in online classification.

Paper Structure

This paper contains 33 sections, 33 theorems, 189 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

[Reformulated kakade_ng_bayesianalg] The estimator in eq: estimators EWA with an isotropic Gaussian prior $\pi_0(\theta)\sim\mathcal{N}(0,B^2I_d)$ satisfies a regret guarantee of order $O(d\log(Bn))$.

Figures (4)

  • Figure 1: 2-D example of solid-angle voter's positive class probability assignments on a margin slice with fixed $\gamma$.
  • Figure 2: (Top): Median (across $5$ repeats in which we modify the sequential order of the data) average log-loss with shaded interquartile range (25th--75th percentile). (Bottom): Acceptance rate and step size of MALA (used for EW) which is implemented to get a target acceptance rate of $0.57$. Both plots are in log-x scale.
  • Figure 3: Average log-loss at round $n=1000$ versus different values of $B$ for AIOLI ($\lambda=1/B^2$) and EW (Gaussian prior scale $B$). Curves show the median over 5 random permutations of the sequential data with shaded regions are interquartile ranges (IQR, 25th--75th percentiles).
  • Figure 4: Worst-of-$\chi$ average regret with $\pm$ s.e. bars over $70$ runs on the adversarial data generating process presented in hazan2014logistic.

Theorems & Definitions (61)

  • Theorem 1
  • proof : Sketch of proof
  • Theorem 2
  • proof : Sketch of proof
  • Corollary 2
  • Proposition 2
  • proof : Sketch of proof
  • Proposition 2
  • proof : Sketch of proof
  • Corollary 2
  • ...and 51 more