Table of Contents
Fetching ...

Self-Directed Learning of Convex Labelings on Graphs

Georgy Sokolov, Maximilian Thiessen, Margarita Akhmejanova, Fabio Vitale, Francesco Orabona

TL;DR

This work studies self-directed learning for graph node classification under geodesic convexity, introducing Good4, a polynomial-time algorithm that learns convex bipartitions with two clusters and achieves a mistake bound of $M\le 3(h(G)+1)^4\ln n$. It extends to near-convex labelings with a bound of $4M^*+3(h(G)+1)^4\ln n$, and provides lower bounds and bounds for specific graph families, illustrating the role of the Hadwiger number in learnability on sparse graphs. The paper also presents a simple linear-time algorithm for homophilic labelings with a bound $|\partial\mathcal{C}_y|+1$, along with multiclass extensions where the bound scales as $2^k h(G)^{4k}\ln n$ for fixed $k$. These results link graph sparsity measures to predictability in self-directed settings and open avenues for tighter bounds, broader graph classes, and multiclass extensions in graph-based learning.

Abstract

We study the problem of classifying the nodes of a given graph in the self-directed learning setup. This learning setting is a variant of online learning, where rather than an adversary determining the sequence in which nodes are presented, the learner autonomously and adaptively selects them. While self-directed learning of Euclidean halfspaces, linear functions, and general multiclass hypothesis classes was recently considered, no results previously existed specifically for self-directed node classification on graphs. In this paper, we address this problem developing efficient algorithms for it. More specifically, we focus on the case of (geodesically) convex clusters, i.e., for every two nodes sharing the same label, all nodes on every shortest path between them also share the same label. In particular, we devise an algorithm with runtime polynomial in $n$ that makes only $3(h(G)+1)^4 \ln n$ mistakes on graphs with two convex clusters, where $n$ is the total number of nodes and $h(G)$ is the Hadwiger number, i.e., the size of the largest clique minor of the graph $G$. We also show that our algorithm is robust to the case that clusters are slightly non-convex, still achieving a mistake bound logarithmic in $n$. Finally, we devise a simple and efficient algorithm for homophilic clusters, where strongly connected nodes tend to belong to the same class.

Self-Directed Learning of Convex Labelings on Graphs

TL;DR

This work studies self-directed learning for graph node classification under geodesic convexity, introducing Good4, a polynomial-time algorithm that learns convex bipartitions with two clusters and achieves a mistake bound of . It extends to near-convex labelings with a bound of , and provides lower bounds and bounds for specific graph families, illustrating the role of the Hadwiger number in learnability on sparse graphs. The paper also presents a simple linear-time algorithm for homophilic labelings with a bound , along with multiclass extensions where the bound scales as for fixed . These results link graph sparsity measures to predictability in self-directed settings and open avenues for tighter bounds, broader graph classes, and multiclass extensions in graph-based learning.

Abstract

We study the problem of classifying the nodes of a given graph in the self-directed learning setup. This learning setting is a variant of online learning, where rather than an adversary determining the sequence in which nodes are presented, the learner autonomously and adaptively selects them. While self-directed learning of Euclidean halfspaces, linear functions, and general multiclass hypothesis classes was recently considered, no results previously existed specifically for self-directed node classification on graphs. In this paper, we address this problem developing efficient algorithms for it. More specifically, we focus on the case of (geodesically) convex clusters, i.e., for every two nodes sharing the same label, all nodes on every shortest path between them also share the same label. In particular, we devise an algorithm with runtime polynomial in that makes only mistakes on graphs with two convex clusters, where is the total number of nodes and is the Hadwiger number, i.e., the size of the largest clique minor of the graph . We also show that our algorithm is robust to the case that clusters are slightly non-convex, still achieving a mistake bound logarithmic in . Finally, we devise a simple and efficient algorithm for homophilic clusters, where strongly connected nodes tend to belong to the same class.
Paper Structure (31 sections, 27 theorems, 16 equations, 4 figures)

This paper contains 31 sections, 27 theorems, 16 equations, 4 figures.

Key Result

Proposition 1

Let $\mathop{\mathrm{\mathcal{H}}}\nolimits\subseteq 2^V$ be a hypothesis space with $|V|=n$. Then, it holds that

Figures (4)

  • Figure 1: Example of a graph with a convex $2$-labeling. In such labelings for any two nodes of the same label, all nodes on every shortest path between them also share the same label.
  • Figure 2: Example of a graph with a $K_4$-minor
  • Figure 3: An example illustrating how Good4 operates. Here curves denote shortest paths and two crossing curves are a good quadruple. Using the good quadruples $\{(a,b_3),(c_2,d_i)\}$ for $i\in\{1,\dots,4\}$ we can infer the labels of the nodes $d_i$.
  • Figure 4: An example illustrating how FindDistinctLabel algorithm operates.

Theorems & Definitions (28)

  • Proposition 1: ben1997online
  • Proposition 1
  • Proposition 1
  • Definition 2: $U$-good nodes and $\varepsilon_U$
  • Theorem 2: Mistake upper bound
  • Theorem 2
  • Proposition 2
  • Proposition 2
  • Proposition 2
  • Corollary 3
  • ...and 18 more