Table of Contents
Fetching ...

Geometric deep learning on graphs and manifolds using mixture model CNNs

Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Jan Svoboda, Michael M. Bronstein

TL;DR

This paper addresses the challenge of applying convolutional neural networks to non-Euclidean data by introducing MoNet, a spatial-domain framework that learns local, patch-based features on graphs and manifolds. MoNet defines patch operators via local pseudo-coordinates and Gaussian mixture kernels, enabling a flexible, learnable convolution that subsumes prior methods like GCNN, ACNN, GCN, and DCNN. Across images, graphs, and 3D shapes, MoNet achieves state-of-the-art performance, demonstrating robustness to varying graph representations and intrinsic deformation invariance on manifolds. The approach broadens the applicability of deep learning to geometric data, with potential impacts on computer vision, network analysis, and shape analysis tasks.

Abstract

Deep learning has achieved a remarkable performance breakthrough in several fields, most notably in speech recognition, natural language processing, and computer vision. In particular, convolutional neural network (CNN) architectures currently produce state-of-the-art performance on a variety of image analysis tasks such as object detection and recognition. Most of deep learning research has so far focused on dealing with 1D, 2D, or 3D Euclidean-structured data such as acoustic signals, images, or videos. Recently, there has been an increasing interest in geometric deep learning, attempting to generalize deep learning methods to non-Euclidean structured data such as graphs and manifolds, with a variety of applications from the domains of network analysis, computational social science, or computer graphics. In this paper, we propose a unified framework allowing to generalize CNN architectures to non-Euclidean domains (graphs and manifolds) and learn local, stationary, and compositional task-specific features. We show that various non-Euclidean CNN methods previously proposed in the literature can be considered as particular instances of our framework. We test the proposed method on standard tasks from the realms of image-, graph- and 3D shape analysis and show that it consistently outperforms previous approaches.

Geometric deep learning on graphs and manifolds using mixture model CNNs

TL;DR

This paper addresses the challenge of applying convolutional neural networks to non-Euclidean data by introducing MoNet, a spatial-domain framework that learns local, patch-based features on graphs and manifolds. MoNet defines patch operators via local pseudo-coordinates and Gaussian mixture kernels, enabling a flexible, learnable convolution that subsumes prior methods like GCNN, ACNN, GCN, and DCNN. Across images, graphs, and 3D shapes, MoNet achieves state-of-the-art performance, demonstrating robustness to varying graph representations and intrinsic deformation invariance on manifolds. The approach broadens the applicability of deep learning to geometric data, with potential impacts on computer vision, network analysis, and shape analysis tasks.

Abstract

Deep learning has achieved a remarkable performance breakthrough in several fields, most notably in speech recognition, natural language processing, and computer vision. In particular, convolutional neural network (CNN) architectures currently produce state-of-the-art performance on a variety of image analysis tasks such as object detection and recognition. Most of deep learning research has so far focused on dealing with 1D, 2D, or 3D Euclidean-structured data such as acoustic signals, images, or videos. Recently, there has been an increasing interest in geometric deep learning, attempting to generalize deep learning methods to non-Euclidean structured data such as graphs and manifolds, with a variety of applications from the domains of network analysis, computational social science, or computer graphics. In this paper, we propose a unified framework allowing to generalize CNN architectures to non-Euclidean domains (graphs and manifolds) and learn local, stationary, and compositional task-specific features. We show that various non-Euclidean CNN methods previously proposed in the literature can be considered as particular instances of our framework. We test the proposed method on standard tasks from the realms of image-, graph- and 3D shape analysis and show that it consistently outperforms previous approaches.

Paper Structure

This paper contains 14 sections, 15 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Left: intrinsic local polar coordinates $\rho, \theta$ on manifold around a point marked in white. Right: patch operator weighting functions $w_i(\rho,\theta)$ used in different generalizations of convolution on the manifold (hand-crafted in GCNN and ACNN and learned in MoNet). All kernels are $L_\infty$-normalized; red curves represent the $0.5$ level set.
  • Figure 2: Representation of images as graphs. Left: regular grid (the graph is fixed for all images). Right: graph of superpixel adjacency (different for each image). Vertices are shown as red circles, edges as red lines.
  • Figure 3: Predictions obtained applying MoNet over the Cora dataset. Marker fill color represents the predicted class; marker outline color represents the groundtruth class.
  • Figure 4: Shape correspondence quality obtained by different methods on the FAUST humans dataset. The raw performance of MoNet is shown in dotted curve.
  • Figure 5: Shape correspondence quality obtained by different methods on FAUST range maps. For comparison, we show the performance of a Euclidean CNN with a comparable 3-layer architecture. The raw performance is shown as dotted curve.
  • ...and 4 more figures