Table of Contents
Fetching ...

Approximately Counting and Sampling Hamiltonian Motifs in Sublinear Time

Talya Eden, Reut Levi, Dana Ron, Ronitt Rubinfeld

TL;DR

This work addresses the problem of approximately counting and nearly uniformly sampling Hamiltonian motifs in graphs within sublinear time. It introduces a framework of attempted samplers and a degrees-typical data structure to enable counting and sampling of Hamiltonian motifs in the standard query model, bridging the gap with the augmented model. The main results include a standard-model algorithm for counting any Hamiltonian motif in sublinear time with complexity $O^*( \frac{n}{n_{\mathsf{F}}^{1/k}} + \frac{m^{k/2}}{n_{\mathsf{F}}} )$ and a nearly uniform sampling variant, plus simplified algorithms for $k$-cliques and $k$-stars. The approach matches augmented-model bounds up to necessary overhead, extends to directed motifs, and provides practical tools for network motif analysis in sublinear regimes.

Abstract

Counting small subgraphs, referred to as motifs, in large graphs is a fundamental task in graph analysis, extensively studied across various contexts and computational models. In the sublinear-time regime, the relaxed problem of approximate counting has been explored within two prominent query frameworks: the standard model, which permits degree, neighbor, and pair queries, and the strictly more powerful augmented model, which additionally allows for uniform edge sampling. Currently, in the standard model, (optimal) results have been established only for approximately counting edges, stars, and cliques, all of which have a radius of one. This contrasts sharply with the state of affairs in the augmented model, where algorithmic results (some of which are optimal) are known for any input motif, leading to a disparity which we term the ``scope gap" between the two models. In this work, we make significant progress in bridging this gap. Our approach draws inspiration from recent advancements in the augmented model and utilizes a framework centered on counting by uniform sampling, thus allowing us to establish new results in the standard model and simplify on previous results. In particular, our first, and main, contribution is a new algorithm in the standard model for approximately counting any Hamiltonian motif in sublinear time. Our second contribution is a variant of our algorithm that enables nearly uniform sampling of these motifs, a capability previously limited in the standard model to edges and cliques. Our third contribution is to introduce even simpler algorithms for stars and cliques by exploiting their radius-one property. As a result, we simplify all previously known algorithms in the standard model for stars (Gonen, Ron, Shavitt (SODA 2010)), triangles (Eden, Levi, Ron Seshadhri (FOCS 2015)) and cliques (Eden, Ron, Seshadri (STOC 2018)).

Approximately Counting and Sampling Hamiltonian Motifs in Sublinear Time

TL;DR

This work addresses the problem of approximately counting and nearly uniformly sampling Hamiltonian motifs in graphs within sublinear time. It introduces a framework of attempted samplers and a degrees-typical data structure to enable counting and sampling of Hamiltonian motifs in the standard query model, bridging the gap with the augmented model. The main results include a standard-model algorithm for counting any Hamiltonian motif in sublinear time with complexity and a nearly uniform sampling variant, plus simplified algorithms for -cliques and -stars. The approach matches augmented-model bounds up to necessary overhead, extends to directed motifs, and provides practical tools for network motif analysis in sublinear regimes.

Abstract

Counting small subgraphs, referred to as motifs, in large graphs is a fundamental task in graph analysis, extensively studied across various contexts and computational models. In the sublinear-time regime, the relaxed problem of approximate counting has been explored within two prominent query frameworks: the standard model, which permits degree, neighbor, and pair queries, and the strictly more powerful augmented model, which additionally allows for uniform edge sampling. Currently, in the standard model, (optimal) results have been established only for approximately counting edges, stars, and cliques, all of which have a radius of one. This contrasts sharply with the state of affairs in the augmented model, where algorithmic results (some of which are optimal) are known for any input motif, leading to a disparity which we term the ``scope gap" between the two models. In this work, we make significant progress in bridging this gap. Our approach draws inspiration from recent advancements in the augmented model and utilizes a framework centered on counting by uniform sampling, thus allowing us to establish new results in the standard model and simplify on previous results. In particular, our first, and main, contribution is a new algorithm in the standard model for approximately counting any Hamiltonian motif in sublinear time. Our second contribution is a variant of our algorithm that enables nearly uniform sampling of these motifs, a capability previously limited in the standard model to edges and cliques. Our third contribution is to introduce even simpler algorithms for stars and cliques by exploiting their radius-one property. As a result, we simplify all previously known algorithms in the standard model for stars (Gonen, Ron, Shavitt (SODA 2010)), triangles (Eden, Levi, Ron Seshadhri (FOCS 2015)) and cliques (Eden, Ron, Seshadri (STOC 2018)).

Paper Structure

This paper contains 41 sections, 11 theorems, 14 equations, 3 figures, 1 table.

Key Result

Theorem 1

Let $G$ be a graph over $n$ vertices and $m$ edges. There exists an algorithm in the standard query model that, given query access to $G$ and parameters $n$, $\epsilon\in(0,1)$ and a Hamiltonian motif $\textsf{F}$ over $k$ vertices, returns a value $\widehat{n}_{\textsf{F}}$ such that $\widehat{n}_{

Figures (3)

  • Figure 1: The letters 'H', 'M' and 'L' signify whether a vertex is high, medium or low, respectively. The cycle in the figure can be covered by several sequences of paths. One example is the sequence of paths, depicted in green (outside the cycle), $\pi_1 = (u_8, u_1), \pi_2 = (u_2, u_3), \pi_3 = (u_4, u_5), \pi_4 = (u_6, u_7)$. Another example is the sequence of paths, depicted in turquoise (inside the cycle), $\pi'_1 = (u_1, u_2), \pi'_2 = (u_3, u_4, u_5, u_6), \pi'_3 = (u_7, u_8)$.
  • Figure 2: There are two different copies of $\textsf{F}$ in $Q$ that include the Hamiltonian cycle $(v_1, v_2, \ldots v_8,v_1)$: one that uses the cord $\{v_2, v_6\}$ and one that uses the cord $\{v_4, v_8\}$. Denoting this cycle by $\textsf{c}$, we have that $n_{\textsf{F}}(Q, \textsf{c}) = 2$.
  • Figure 3: The letters 'H', 'M' and 'L' signify wether a vertex is high, medium or low, respectively. The sequence $<2, 5, 1>$ fits the cycle on the left. To verify this, consider the following corresponding sequence of paths that covers the cycle: $\pi_1 = (u_1,u_8)$, $\pi_2 = (u_7,u_6,u_5,u_3,u_2)$, $\pi_3 = (u_1)$. The sequence $<1, 1, 4, 2>$ also fits this cycle. To verify this, consider the following sequence of paths that covers the cycle: $\pi_1 = (u_1)$, $\pi_2 = (u_2)$, $\pi_3 = (u_3,u_4,u_5,u_6)$, $\pi_4 = (u_7,u_8)$. The sequence $<2, 2, 2, 2>$ fits the cycle on the right. There are several sequences of paths that cover the cycle and correspond to this sequence of lengths. For example the sequence $\pi_1 = (u_1, u_2)$, $\pi_2 = (u_3, u_4)$, $\pi_3 = (u_5, u_6)$, $\pi_4 = (7,8)$, and the sequence $\pi'_1 = (u_3, u_2)$, $\pi'_2 = (u_1, u_8)$, $\pi'_3 = (u_7, u_6)$, $\pi'_4 = (5,4)$.

Theorems & Definitions (19)

  • Theorem 1
  • Theorem 2
  • Definition 1: Copies of a motif $\textsf{F}$
  • Definition 2: Hamiltonian cycles of a copy
  • Definition 3: Number of copies in a subgraph that contain a given cycle
  • Definition 4: An $(\overline{\epsilon},\overline{\gamma},\overline{m})$-degrees-typical multiset and data structure
  • Lemma 1: Constructing a degrees-typical data structure
  • Lemma 2: Sampling medium-high vertices
  • Lemma 3
  • Definition 5: paths cover
  • ...and 9 more