Table of Contents
Fetching ...

Clustering Dynamics for Improved Speed Prediction Deriving from Topographical GPS Registrations

Sarah Almeida Carneiro, Giovanni Chierchia, Aurelie Pirayre, Laurent Najman

TL;DR

This work tackles speed prediction in data-sparse regions by exploiting topographical similarities and road-design features through a Temporally Orientated Speed Dictionary centered on topographically clustered links (CDS). It introduces an Off-training Region Table to build specialized dictionaries, an ILSTM-based per-link spatio-temporal representation, and a Random Ordered Past Point Association (ROPPA) mechanism within an RNN that jointly optimizes regression and classification losses. Experimental results across topographical and infrastructural feature sets show that CDS-derived features and ROPPA-RNN architectures yield competitive, often superior, speed predictions with reduced dependence on exact vehicle positioning. The approach enables extrapolation to missing regions and supports simulatable, data-efficient speed profiling for Intelligent Transportation Systems.

Abstract

A persistent challenge in the field of Intelligent Transportation Systems is to extract accurate traffic insights from geographic regions with scarce or no data coverage. To this end, we propose solutions for speed prediction using sparse GPS data points and their associated topographical and road design features. Our goal is to investigate whether we can use similarities in the terrain and infrastructure to train a machine learning model that can predict speed in regions where we lack transportation data. For this we create a Temporally Orientated Speed Dictionary Centered on Topographically Clustered Roads, which helps us to provide speed correlations to selected feature configurations. Our results show qualitative and quantitative improvement over new and standard regression methods. The presented framework provides a fresh perspective on devising strategies for missing data traffic analysis.

Clustering Dynamics for Improved Speed Prediction Deriving from Topographical GPS Registrations

TL;DR

This work tackles speed prediction in data-sparse regions by exploiting topographical similarities and road-design features through a Temporally Orientated Speed Dictionary centered on topographically clustered links (CDS). It introduces an Off-training Region Table to build specialized dictionaries, an ILSTM-based per-link spatio-temporal representation, and a Random Ordered Past Point Association (ROPPA) mechanism within an RNN that jointly optimizes regression and classification losses. Experimental results across topographical and infrastructural feature sets show that CDS-derived features and ROPPA-RNN architectures yield competitive, often superior, speed predictions with reduced dependence on exact vehicle positioning. The approach enables extrapolation to missing regions and supports simulatable, data-efficient speed profiling for Intelligent Transportation Systems.

Abstract

A persistent challenge in the field of Intelligent Transportation Systems is to extract accurate traffic insights from geographic regions with scarce or no data coverage. To this end, we propose solutions for speed prediction using sparse GPS data points and their associated topographical and road design features. Our goal is to investigate whether we can use similarities in the terrain and infrastructure to train a machine learning model that can predict speed in regions where we lack transportation data. For this we create a Temporally Orientated Speed Dictionary Centered on Topographically Clustered Roads, which helps us to provide speed correlations to selected feature configurations. Our results show qualitative and quantitative improvement over new and standard regression methods. The presented framework provides a fresh perspective on devising strategies for missing data traffic analysis.
Paper Structure (23 sections, 4 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 4 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Diagram of the framework used for this paper. In $I$ (Subsection \ref{['subsec:I']})(1) we define an region not used for training the network (OTR) that we have data. In $II$ (Subsection \ref{['subsec:cd']}) with the (2) unique links extracted from the OTR we create (3) an Individual Link Spatio-Temporal (ILSTM) for each OTR link. We then (4) cluster the OTR links based on their features and (5) aggregate all of the ILSTM for all of the links that were attributed to the same cluster to create a Cluster Dictionary. Thus, by having all of these cluster dictionaries, we create our (6) Temporally Orientated Speed Dictionary Centered on Topographically Clustered Links. In $III$ (Subsection \ref{['subsec:III']}) we work with (7) links that will be used in training, we get the (8) GPS registrations of these links, and we use a (9) link inference clustering process(where each link is assigned to one of the previously calculated clusters). We can then (10) reference the corresponding dictionary (subsection \ref{['subsec:cd']}) based on cluster information and temporal details of when the trip occurred. We join the (12) links features to the retrieved (11,13) relevant dictionary feature associated with that specific data point. Finally In $IV$ (Subsection \ref{['subsec:IV']}) we classifying an input and associating its loss with the regression process to better train the regression model.
  • Figure 2: This diagram illustrates the structure of the ILSTM (Individual Link Spatio-Temporal Matrix) and how it is populated. Typically, a link is associated with multiple trips, denoted as $n$. For this explanation, let's assume the link has an identifier $A$, and we will consider $T_1$ as the first trip recorded with $A$ in its GPS data. Consider that $T_1$ occurred at 2 AM on a Tuesday. In this scenario, we can map the corresponding locations within the ILSTM to specific coordinates: depth 1, row 2, and the respective subdivision column. Subsequently, we calculate the average of all data points within each position of the matrix, taking into account the number of data points registered for that particular location. The resulting values are then allocated to the final table.
  • Figure 3: This diagram refers to how we construct a cluster dictionary section (CD) and how their set becomes our Temporally Orientated Speed Dictionary Centered on Topographically Clustered Links. In $I$ the we have the unique OTR links with their topographical features. In $II$ we have the cluster ids for each of these uniques OTR links based on the k-means algorithm. In $III$ all the links that have identical cluster ids have their ILSTM aggregated to from the respective Cluster Dictionary. After all cluster dictionaries are formed, their set ($IV$) is what we call Temporally Orientated Speed Dictionary Centered on Topographically Clustered Links.
  • Figure 4: Random Ordered Past Point Association (ROPPA): For each point in a trip trajectory a different random skip vector is generated and becomes a guideline for a ROPPA. As it is possible to see in this example a case in which the point was time step 15, the MTS defined length as equal to 6, and the maximal skip as 3
  • Figure 5: Examples of qualitative speed predictions results using topographical features on 6 different trips of: MLP with no CDS, MLP_f with CDS, and ROPPA RNN.
  • ...and 1 more figures