Table of Contents
Fetching ...

Geometric Reasoning in the Embedding Space

Jan Hůla, David Mojžíšek, Jiří Janeček, David Herel, Mikoláš Janota

TL;DR

This work studies how neural models can reason about geometric constraints by constructing a synthetic constraint satisfaction problem on a discrete 2D grid. It compares a Graph Neural Network and an autoregressive Transformer on predicting the positions of unknown points defined by constraints M, R, S, T, and fixed points P, revealing that embeddings evolve to reflect the underlying geometry. The results show the GNN significantly outperforms the Transformer and scales to grid sizes up to 80 by 80, while embedding visualizations reveal the emergence of a 2D grid structure in the static embeddings and iterative refinement in the solution process. The findings offer insight into embedding-space mechanisms for geometric reasoning and highlight scalability advantages of GNNs over Transformers in this setting, while outlining limitations and avenues for future work.

Abstract

In this contribution, we demonstrate that Graph Neural Networks and Transformers can learn to reason about geometric constraints. We train them to predict spatial position of points in a discrete 2D grid from a set of constraints that uniquely describe hidden figures containing these points. Both models are able to predict the position of points and interestingly, they form the hidden figures described by the input constraints in the embedding space during the reasoning process. Our analysis shows that both models recover the grid structure during training so that the embeddings corresponding to the points within the grid organize themselves in a 2D subspace and reflect the neighborhood structure of the grid. We also show that the Graph Neural Network we design for the task performs significantly better than the Transformer and is also easier to scale.

Geometric Reasoning in the Embedding Space

TL;DR

This work studies how neural models can reason about geometric constraints by constructing a synthetic constraint satisfaction problem on a discrete 2D grid. It compares a Graph Neural Network and an autoregressive Transformer on predicting the positions of unknown points defined by constraints M, R, S, T, and fixed points P, revealing that embeddings evolve to reflect the underlying geometry. The results show the GNN significantly outperforms the Transformer and scales to grid sizes up to 80 by 80, while embedding visualizations reveal the emergence of a 2D grid structure in the static embeddings and iterative refinement in the solution process. The findings offer insight into embedding-space mechanisms for geometric reasoning and highlight scalability advantages of GNNs over Transformers in this setting, while outlining limitations and avenues for future work.

Abstract

In this contribution, we demonstrate that Graph Neural Networks and Transformers can learn to reason about geometric constraints. We train them to predict spatial position of points in a discrete 2D grid from a set of constraints that uniquely describe hidden figures containing these points. Both models are able to predict the position of points and interestingly, they form the hidden figures described by the input constraints in the embedding space during the reasoning process. Our analysis shows that both models recover the grid structure during training so that the embeddings corresponding to the points within the grid organize themselves in a 2D subspace and reflect the neighborhood structure of the grid. We also show that the Graph Neural Network we design for the task performs significantly better than the Transformer and is also easier to scale.

Paper Structure

This paper contains 24 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Evolution of static embeddings throughout training. Lines and colors represent the connectivity of points within the 30$\times$30 grid in one direction. UMAP mcinnes2018umap is used as a projection method from the original 96 dimensions of the embedding space. We found that UMAP works best for larger grids; however, simpler PCA projections worked better for smaller grids (15$\times$15 and less points).
  • Figure 2: Visualization of the solution process. The red points A, B, C, D, E, F (C = E) are known, and the blue points G, H, I, J, K, L need to be predicted. There are two translation constraints: T (F, D, C=E, G), T (C=E, G, A, H) and two square constraints: S (H, I, G, J), S (B, J, L, K). Translations are marked by a dotted line and squares by a solid line. The network is trained to predict the result in $15$ iterations, of which initial state and results after iterations $3, 5, 7, 9, 11, 12, 13$ were chosen for illustration. The network gradually improves the result over the iterations: the first translation with only one unknown point G is solved, followed by finding the point H of the second translation. After translations, both squares are solved.
  • Figure 3: Predicted points classified based on the number of resolving steps (on the constraint level) that are necessary to determine the point positions.
  • Figure 4: Scaling laws for different sizes of the 2D grid. For a grid of size $n$, there are $n^2$ points.
  • Figure 5: We used UMAP to show that the internal structure of static embeddings discovered by the model reflects the 2D essence of the grid. The visualization shows a projection into 3D taken after 1, 5, 15, 50 and 100 epochs of training.
  • ...and 2 more figures