Table of Contents
Fetching ...

Node Regression on Latent Position Random Graphs via Local Averaging

Martin Gjorgjevski, Nicolas Keriven, Simon Barthelmé, Yohann De Castro

TL;DR

This work begins by studying the simplest possible estimator for graph regression, which consists in averaging the value of the label at all neighboring nodes, and shows that in Latent Position Models this estimator tends to a Nadaraya Watson estimator in the latent space, and that its rate of convergence is in fact the same.

Abstract

Node regression consists in predicting the value of a graph label at a node, given observations at the other nodes. To gain some insight into the performance of various estimators for this task, we perform a theoretical study in a context where the graph is random. Specifically, we assume that the graph is generated by a Latent Position Model, where each node of the graph has a latent position, and the probability that two nodes are connected depend on the distance between the latent positions of the two nodes. In this context, we begin by studying the simplest possible estimator for graph regression, which consists in averaging the value of the label at all neighboring nodes. We show that in Latent Position Models this estimator tends to a Nadaraya Watson estimator in the latent space, and that its rate of convergence is in fact the same. One issue with this standard estimator is that it averages over a region consisting of all neighbors of a node, and that depending on the graph model this may be too much or too little. An alternative consists in first estimating the true distances between the latent positions, then injecting these estimated distances into a classical Nadaraya Watson estimator. This enables averaging in regions either smaller or larger than the typical graph neighborhood. We show that this method can achieve standard nonparametric rates in certain instances even when the graph neighborhood is too large or too small.

Node Regression on Latent Position Random Graphs via Local Averaging

TL;DR

This work begins by studying the simplest possible estimator for graph regression, which consists in averaging the value of the label at all neighboring nodes, and shows that in Latent Position Models this estimator tends to a Nadaraya Watson estimator in the latent space, and that its rate of convergence is in fact the same.

Abstract

Node regression consists in predicting the value of a graph label at a node, given observations at the other nodes. To gain some insight into the performance of various estimators for this task, we perform a theoretical study in a context where the graph is random. Specifically, we assume that the graph is generated by a Latent Position Model, where each node of the graph has a latent position, and the probability that two nodes are connected depend on the distance between the latent positions of the two nodes. In this context, we begin by studying the simplest possible estimator for graph regression, which consists in averaging the value of the label at all neighboring nodes. We show that in Latent Position Models this estimator tends to a Nadaraya Watson estimator in the latent space, and that its rate of convergence is in fact the same. One issue with this standard estimator is that it averages over a region consisting of all neighbors of a node, and that depending on the graph model this may be too much or too little. An alternative consists in first estimating the true distances between the latent positions, then injecting these estimated distances into a classical Nadaraya Watson estimator. This enables averaging in regions either smaller or larger than the typical graph neighborhood. We show that this method can achieve standard nonparametric rates in certain instances even when the graph neighborhood is too large or too small.

Paper Structure

This paper contains 32 sections, 18 theorems, 141 equations, 6 figures, 2 algorithms.

Key Result

Proposition 3

Let $\mathop{\mathrm{Bias}}\nolimits\left[\hat{f}_{\mathop{\mathrm{\mathrm{GNW}}}\nolimits}(\bm{\mathit{x}})\right]$ and $\mathop{\mathrm{Var}}\nolimits\left[\hat{f}_{\mathop{\mathrm{\mathrm{GNW}}}\nolimits}(\bm{\mathit{x}})\right]$ denote the standard bias and variance of $\hat{f}_{\mathop{\mathrm{ If Assumptions ass:bounded_f and ass:additive_noise hold, then and

Figures (6)

  • Figure 1: Sampling a LPM: Left --- generating uniformly 1000 latent positions on ${[-1,1]}^2$. Left: Latent positions. Right: generating a random geometric graph with $h_g=0.1$. The color represents the labels --- brighter colors correlate with higher values
  • Figure 2: Bias Variance Tradeoff Curves for NW under perturbation. Sample size $n=500$, label $y_i = \sin(4\pi\bm{\mathit{x}}_i)+\epsilon_i$ with $\epsilon_i\sim \mathcal{N}(0,1.5)$ and $\bm{\mathit{x}}_i\sim \mathop{\mathrm{Unif}}\nolimits[0,1]$
  • Figure 3: Illustration for a LPM with sample size $n=500$ and length-scale $h_g=0.1$. Figure \ref{['fig:spectral_decay']} shows a scatter plot of descending eigenvalues of $\bm{\mathrm{K}}$ and $\bm{\mathrm{A}}$. Interestingly, the first few eigenvalues of $\bm{\mathrm{A}}$ are very close to the corresponding ordered eigenvalues of $\bm{\mathrm{K}}$. Figure \ref{['fig:histogram']} shows a histogram of Eigenvalues of $\bm{\mathrm{A}}$. The top several eigenvalues of $\bm{\mathrm{A}}$ are well separated from the rest, which fall in the semicircular bulk.
  • Figure 4: Scatter plots of $(\bm{\mathrm{X}}_n,\bm{\mathit{u}}_j)$ and $(\bm{\mathrm{X}}_{n},\bm{\mathit{v}}_j)$. Figures \ref{['fig:eigvecs_K_sep']} and \ref{['fig:eigvecs_A_sep']} show a scatter plots of the 11th eigenvector of the matrices $\bm{\mathrm{K}}$ and $\bm{\mathrm{A}}$ respectively. This is the index of the last eigenvalue that separates from the bulk. Figures \ref{['fig:eigvecs_K_bulk']} and \ref{['fig:eigvecs_A_bulk']} demonstrate the same scatter plot, for the 12th eigenvector of the matrices $\bm{\mathrm{K}}$ and $\bm{\mathrm{A}}$, respectively. This is the index of the first eigenvalue that belongs in the bulk.
  • Figure 5: Empirical error of $\mathcal{B}_{sp}$ and $\mathcal{B}_{spectral}$ as a function of the length-scale $h_g$ of the LPM.
  • ...and 1 more figures

Theorems & Definitions (19)

  • Proposition 3
  • Theorem 4
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • Lemma 13
  • Lemma 14
  • Theorem 15
  • Theorem 16
  • Theorem 17
  • ...and 9 more