Table of Contents
Fetching ...

Geotokens and Geotransformers

Eren Unlu

TL;DR

The paper addresses encoding geographic positions in transformers, arguing that spatial coordinates and relative proximity are more informative than token order for geospatial data. It extends Rotary Position Encoding (RoPE) to spherical coordinates by introducing geotokens and a geotransformer architecture with a spherical rotation matrix $\mathbf{R}_{\phi,\theta}^{d}$ driven by longitude $\phi$ and latitude $\theta$. An experimental proof-of-concept shows the spherical RoPE encoding improves training dynamics compared to random geolocation encodings in a controlled distance-prediction task. The results suggest that geospatially aware geometric rotations can support more accurate, coordinate-aware reasoning in transformer models, with potential applications in urban planning, environmental monitoring, and navigation.

Abstract

In transformer architectures, position encoding primarily provides a sense of sequence for input tokens. While the original transformer paper's method has shown satisfactory results in general language processing tasks, there have been new proposals, such as Rotary Position Embedding (RoPE), for further improvement. This paper presents geotokens, input components for transformers, each linked to a specific geological location. Unlike typical language sequences, for these tokens, the order is not as vital as the geographical coordinates themselves. To represent the relative position in this context and to keep a balance between the real world distance and the distance in the embedding space, we design a position encoding approach drawing from the RoPE structure but tailored for spherical coordinates.

Geotokens and Geotransformers

TL;DR

The paper addresses encoding geographic positions in transformers, arguing that spatial coordinates and relative proximity are more informative than token order for geospatial data. It extends Rotary Position Encoding (RoPE) to spherical coordinates by introducing geotokens and a geotransformer architecture with a spherical rotation matrix driven by longitude and latitude . An experimental proof-of-concept shows the spherical RoPE encoding improves training dynamics compared to random geolocation encodings in a controlled distance-prediction task. The results suggest that geospatially aware geometric rotations can support more accurate, coordinate-aware reasoning in transformer models, with potential applications in urban planning, environmental monitoring, and navigation.

Abstract

In transformer architectures, position encoding primarily provides a sense of sequence for input tokens. While the original transformer paper's method has shown satisfactory results in general language processing tasks, there have been new proposals, such as Rotary Position Embedding (RoPE), for further improvement. This paper presents geotokens, input components for transformers, each linked to a specific geological location. Unlike typical language sequences, for these tokens, the order is not as vital as the geographical coordinates themselves. To represent the relative position in this context and to keep a balance between the real world distance and the distance in the embedding space, we design a position encoding approach drawing from the RoPE structure but tailored for spherical coordinates.
Paper Structure (7 sections, 9 equations, 3 figures)

This paper contains 7 sections, 9 equations, 3 figures.

Figures (3)

  • Figure 1: (a) A basic geotoken is characterized by its location, given in terms of latitude and longitude. (b) The underlying features of this geotoken could be encoded using any pre-existing neural model, in this instance, a textual description processed through NLP. (c) The Geotransformer framework is designed to handle these geotokens.
  • Figure 2: The illustration of the used geotransformer architecture in the experiment.
  • Figure 3: Training loss of properly geo-encoded locations versus random latitudes and longitudes for the same geotransformer architecture for the proposed experimental setting. Spherical position encoding improves the training process significantly, which constitutes an early proof-of-concept.