Table of Contents
Fetching ...

IMAIA: Interactive Maps AI Assistant for Travel Planning and Geo-Spatial Intelligence

Jieren Deng, Zhizhang Hu, Ziyan He, Aleksandar Cvetkovic, Pak Kiu Chung, Dragomir Yankov, Chiqun Zhang

Abstract

Map applications are still largely point-and-click, making it difficult to ask map-centric questions or connect what a camera sees to the surrounding geospatial context with view-conditioned inputs. We introduce IMAIA, an interactive Maps AI Assistant that enables natural-language interaction with both vector (street) maps and satellite imagery, and augments camera inputs with geospatial intelligence to help users understand the world. IMAIA comprises two complementary components. Maps Plus treats the map as first-class context by parsing tiled vector/satellite views into a grid-aligned representation that a language model can query to resolve deictic references (e.g., ``the flower-shaped building next to the park in the top-right''). Places AI Smart Assistant (PAISA) performs camera-aware place understanding by fusing image--place embeddings with geospatial signals (location, heading, proximity) to ground a scene, surface salient attributes, and generate concise explanations. A lightweight multi-agent design keeps latency low and exposes interpretable intermediate decisions. Across map-centric QA and camera-to-place grounding tasks, IMAIA improves accuracy and responsiveness over strong baselines while remaining practical for user-facing deployments. By unifying language, maps, and geospatial cues, IMAIA moves beyond scripted tools toward conversational mapping that is both spatially grounded and broadly usable.

IMAIA: Interactive Maps AI Assistant for Travel Planning and Geo-Spatial Intelligence

Abstract

Map applications are still largely point-and-click, making it difficult to ask map-centric questions or connect what a camera sees to the surrounding geospatial context with view-conditioned inputs. We introduce IMAIA, an interactive Maps AI Assistant that enables natural-language interaction with both vector (street) maps and satellite imagery, and augments camera inputs with geospatial intelligence to help users understand the world. IMAIA comprises two complementary components. Maps Plus treats the map as first-class context by parsing tiled vector/satellite views into a grid-aligned representation that a language model can query to resolve deictic references (e.g., ``the flower-shaped building next to the park in the top-right''). Places AI Smart Assistant (PAISA) performs camera-aware place understanding by fusing image--place embeddings with geospatial signals (location, heading, proximity) to ground a scene, surface salient attributes, and generate concise explanations. A lightweight multi-agent design keeps latency low and exposes interpretable intermediate decisions. Across map-centric QA and camera-to-place grounding tasks, IMAIA improves accuracy and responsiveness over strong baselines while remaining practical for user-facing deployments. By unifying language, maps, and geospatial cues, IMAIA moves beyond scripted tools toward conversational mapping that is both spatially grounded and broadly usable.

Paper Structure

This paper contains 20 sections, 3 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: User interface of Maps Plus showing handling a query "What is the name of the flower-shaped building next to the park on the map" from the user.
  • Figure 2: Workflow comparison of four settings: a standalone MLLM, an MLLM with coordinates, an MLLM with verbose place context, and our Maps Plus approach.
  • Figure 3: Illustration of quadkey-based visual prompting.
  • Figure 4: The user interface (left) of the Places AI Smart Assistant and its underlying multi-agent framework (right). PAISA offers two interface modes: a chatbot for answering user queries and an interactive navigation mode for destination guidance. The multi-agent framework consists of an orchestrator coordinating three specialized agents: the location intelligence agent, the interactive navigation agent, and the spatial understanding agent.
  • Figure 5: Illustration of a person’s relative direction from the current position ($\phi_1, \lambda_1$) to the destination($\phi_2, \lambda_2$).
  • ...and 6 more figures