Table of Contents
Fetching ...

Revisiting FastMap: New Applications

Ang Li

TL;DR

The dissertation presents a generalization of FastMap to embed graph vertices and complex objects into Euclidean space in near-linear time, enabling efficient geometric solutions to large-scale graph problems. It introduces a FastMap+LSH framework that solves facility location, top-K centrality, community detection, and graph-convex-hull tasks by solving Euclidean-space relaxations and mapping results back to graphs. A novel FastMapSVM framework combines FastMap embeddings with SVMs, achieving interpretable, data-efficient classification for seismograms and CSP satisfiability, with strong empirical performance against neural baselines. The work also introduces PASPD-based distances for block modeling, iterative FM-based graph convex hulls, and demonstrates substantial speedups and competitive quality across AI, ML, and computational-geometry domains. Together, these contributions establish FastMap as a versatile tool for scalable graph reasoning and learning on complex objects, with broad practical impact and avenues for future enhancements.

Abstract

FastMap was first introduced in the Data Mining community for generating Euclidean embeddings of complex objects. In this dissertation, we first present FastMap to generate Euclidean embeddings of graphs in near-linear time: The pairwise Euclidean distances approximate a desired graph-based distance function on the vertices. We then apply the graph version of FastMap to efficiently solve various graph-theoretic problems of significant interest in AI: including facility location, top-K centrality computations, community detection and block modeling, and graph convex hull computations. We also present a novel learning framework, called FastMapSVM, by combining FastMap and Support Vector Machines. We then apply FastMapSVM to predict the satisfiability of Constraint Satisfaction Problems and to classify seismograms in Earthquake Science.

Revisiting FastMap: New Applications

TL;DR

The dissertation presents a generalization of FastMap to embed graph vertices and complex objects into Euclidean space in near-linear time, enabling efficient geometric solutions to large-scale graph problems. It introduces a FastMap+LSH framework that solves facility location, top-K centrality, community detection, and graph-convex-hull tasks by solving Euclidean-space relaxations and mapping results back to graphs. A novel FastMapSVM framework combines FastMap embeddings with SVMs, achieving interpretable, data-efficient classification for seismograms and CSP satisfiability, with strong empirical performance against neural baselines. The work also introduces PASPD-based distances for block modeling, iterative FM-based graph convex hulls, and demonstrates substantial speedups and competitive quality across AI, ML, and computational-geometry domains. Together, these contributions establish FastMap as a versatile tool for scalable graph reasoning and learning on complex objects, with broad practical impact and avenues for future enhancements.

Abstract

FastMap was first introduced in the Data Mining community for generating Euclidean embeddings of complex objects. In this dissertation, we first present FastMap to generate Euclidean embeddings of graphs in near-linear time: The pairwise Euclidean distances approximate a desired graph-based distance function on the vertices. We then apply the graph version of FastMap to efficiently solve various graph-theoretic problems of significant interest in AI: including facility location, top-K centrality computations, community detection and block modeling, and graph convex hull computations. We also present a novel learning framework, called FastMapSVM, by combining FastMap and Support Vector Machines. We then apply FastMapSVM to predict the satisfiability of Constraint Satisfaction Problems and to classify seismograms in Earthquake Science.

Paper Structure

This paper contains 78 sections, 18 equations, 29 figures, 22 tables, 6 algorithms.

Figures (29)

  • Figure 1: Illustrates the edit distance between two DNA strings. The left half shows two snippets of DNA strings extracted from a collection of them. The right half shows the minimum number of edit operations required to convert one to the other.
  • Figure 2: Shows a domain where the complex objects are images of animals. A well-defined clustering task is to group the images that portray the same animal species.
  • Figure 3: Shows a domain where the complex objects are text documents (albeit in different formats). A well-defined clustering task is to group the text documents that have similar content.
  • Figure 4: Illustrates how coordinates are computed and recursion is carried out in FastMap, borrowed from cujakk18.
  • Figure 5: Illustrates the two shortest-path trees rooted at the pivots in each iteration of FastMap on graphs.
  • ...and 24 more figures