VecCity: A Taxonomy-guided Library for Map Entity Representation Learning
Wentao Zhang, Jingyuan Wang, Yifan Yang, Leong Hou U
TL;DR
The paper tackles fragmentation and benchmarking gaps in MapRL by introducing a method-based taxonomy that groups models by Map Data, Encoder Models, Pre-training Tasks, and Downstream Tasks, decoupling core components from entity types. Building on this taxonomy, it introduces VecCity, a modular library with data, upstream, and downstream modules that unifies encoding, pre-training, fine-tuning, and evaluation, while reproducing 21 mainstream MapRL models across nine cities to establish standardized benchmarks. The study systematically analyzes the impact of encoder types, pre-training tasks, and auxiliary data, revealing that multi-encoder pipelines and heterogeneous pre-training generally improve performance, though contrastive methods can introduce instability and raise computational costs. VecCity thus provides a reusable framework to accelerate MapRL research and practical deployment, with open data pipelines and reproducible baselines to guide future work in pre-trained spatiotemporal representations.
Abstract
Electronic maps consist of diverse entities, such as points of interest (POIs), road networks, and land parcels, playing a vital role in applications like ITS and LBS. Map entity representation learning (MapRL) generates versatile and reusable data representations, providing essential tools for efficiently managing and utilizing map entity data. Despite the progress in MapRL, two key challenges constrain further development. First, existing research is fragmented, with models classified by the type of map entity, limiting the reusability of techniques across different tasks. Second, the lack of unified benchmarks makes systematic evaluation and comparison of models difficult. To address these challenges, we propose a novel taxonomy for MapRL that organizes models based on functional module-such as encoders, pre-training tasks, and downstream tasks-rather than by entity type. Building on this taxonomy, we present a taxonomy-driven library, VecCity, which offers easy-to-use interfaces for encoding, pre-training, fine-tuning, and evaluation. The library integrates datasets from nine cities and reproduces 21 mainstream MapRL models, establishing the first standardized benchmarks for the field. VecCity also allows users to modify and extend models through modular components, facilitating seamless experimentation. Our comprehensive experiments cover multiple types of map entities and evaluate 21 VecCity pre-built models across various downstream tasks. Experimental results demonstrate the effectiveness of VecCity in streamlining model development and provide insights into the impact of various components on performance. By promoting modular design and reusability, VecCity offers a unified framework to advance research and innovation in MapRL. The code is available at https://github.com/Bigscity-VecCity/VecCity.
