Table of Contents
Fetching ...

Complexity of Classical Acceleration for $\ell_1$-Regularized PageRank

Kimon Fountoulakis, David Martínez-Rubio

TL;DR

This work analyze FISTA on a slightly over-regularized objective and shows that, under a checkable confinement condition, all spurious activations remain inside a boundary set $\mathcal{B}$ and provides graph-structural conditions that imply such confinement.

Abstract

We study the degree-weighted work required to compute $\ell_1$-regularized PageRank using the standard one-gradient-per-iteration accelerated proximal-gradient method (FISTA). For non-accelerated local methods, the best known worst-case work scales as $\widetilde{O} ((αρ)^{-1})$, where $α$ is the teleportation parameter and $ρ$ is the $\ell_1$-regularization parameter. A natural question is whether FISTA can improve the dependence on $α$ from $1/α$ to $1/\sqrtα$ while preserving the $1/ρ$ locality scaling. The challenge is that acceleration can break locality by transiently activating nodes that are zero at optimality, thereby increasing the cost of gradient evaluations. We analyze FISTA on a slightly over-regularized objective and show that, under a checkable confinement condition, all spurious activations remain inside a boundary set $\mathcal{B}$. This yields a bound consisting of an accelerated $(ρ\sqrtα)^{-1}\log(α/\varepsilon)$ term plus a boundary overhead $\sqrt{vol(\mathcal{B})}/(ρα^{3/2})$. We provide graph-structural conditions that imply such confinement. Experiments on synthetic and real graphs show the resulting speedup and slowdown regimes under the degree-weighted work model.

Complexity of Classical Acceleration for $\ell_1$-Regularized PageRank

TL;DR

This work analyze FISTA on a slightly over-regularized objective and shows that, under a checkable confinement condition, all spurious activations remain inside a boundary set and provides graph-structural conditions that imply such confinement.

Abstract

We study the degree-weighted work required to compute -regularized PageRank using the standard one-gradient-per-iteration accelerated proximal-gradient method (FISTA). For non-accelerated local methods, the best known worst-case work scales as , where is the teleportation parameter and is the -regularization parameter. A natural question is whether FISTA can improve the dependence on from to while preserving the locality scaling. The challenge is that acceleration can break locality by transiently activating nodes that are zero at optimality, thereby increasing the cost of gradient evaluations. We analyze FISTA on a slightly over-regularized objective and show that, under a checkable confinement condition, all spurious activations remain inside a boundary set . This yields a bound consisting of an accelerated term plus a boundary overhead . We provide graph-structural conditions that imply such confinement. Experiments on synthetic and real graphs show the resulting speedup and slowdown regimes under the degree-weighted work model.
Paper Structure (33 sections, 11 theorems, 77 equations, 8 figures)

This paper contains 33 sections, 11 theorems, 77 equations, 8 figures.

Key Result

Lemma 4.1

[proof:lem:coord_jump] Fix $y\in\mathbb{R}^n$. For every $i\in A(y)$, $|u(y)_i-u(x^\star)_i| > \eta \gamma_i\sqrt{d_i}$.

Figures (8)

  • Figure 1: Adjacency density. For each boundary size $|\mathcal{B}|$ we visualize the adjacency matrix via a binned edge-density heatmap (bin size $20$), where each pixel shows the fraction of possible edges between a pair of bins (log-scaled; colormap magma with white below $10^{-4}$). Dashed lines mark the core | boundary | exterior block boundaries. The plots show the clique (upper-left block), the boundary circulant band, the nearly dense exterior block, and the sparse cross-region interfaces.
  • Figure 2: Work vs. $\operatorname{vol}(\mathcal{B})$. Work by ISTA and FISTA against $\operatorname{vol}(\mathcal{B})$.
  • Figure 3: Sweeps at fixed $|\mathcal{B}|=600$.\ref{['fig:B600_work_vs_rho_dense']} shows the $\rho$-sweep with a dense core (clique) on a fresh randomized graph per $\rho$; \ref{['fig:B600_work_vs_rho_sparse']} shows the $\rho$-sweep with a sparse core (connected, $20\%$ of clique edges) on a fresh randomized graph per $\rho$. \ref{['fig:B600_work_vs_alpha']} sweeps $\alpha$ at a fixed residual tolerance $\varepsilon=10^{-6}$ on a single instance constructed to satisfy $\xi>0$ and the no-percolation condition at the smallest swept value, with parameters selected by an inexpensive auto-tuning step (\ref{['app:b600_sweeps_full']}). \ref{['fig:B600_work_vs_epsilon']} sweeps the tolerance $\varepsilon$ at fixed $\alpha=0.20$ on the baseline unweighted instance.
  • Figure 4: Real graphs: work vs. $\alpha$. Work to reach tolerance $10^{-8}$ as a function of $\alpha$, with $\rho=10^{-4}$ fixed. Curves show mean over $300$ random seeds; shaded bands are interquartile ranges.
  • Figure 5: Real graphs: work vs. KKT tolerance. Work to reach $\varepsilon$, with $\alpha=0.20$ and $\rho=10^{-4}$ fixed. Curves show mean over $300$ random seeds; shaded bands are interquartile ranges.
  • ...and 3 more figures

Theorems & Definitions (14)

  • Definition 3.1
  • Lemma 4.1
  • Lemma 4.2
  • Theorem 4.3
  • Theorem 4.4
  • Remark 4.5
  • Remark 4.6
  • Lemma A.1: Initial gap
  • Corollary A.3: FISTA iterates
  • Lemma A.4: Monotonicity of the $\ell_1$-regularized PageRank path
  • ...and 4 more