Table of Contents
Fetching ...

Dimensionality Reduced Clustered Data and Order Partition and Stepwise Dimensionality Increasing Indices

Alexander Thomasian

TL;DR

This work addresses scalable similarity search over high-dimensional satellite-image feature vectors by integrating clustering with dimensionality reduction. The authors introduce Clustered SVD (CSVD), combining per-cluster SVD with rotation to uncorrelated subspaces and a memory-resident index built from OP-tree and SDI-tree structures to accelerate $k$-NN queries, while using NMSE-based metrics to govern dimensionality retention. They present exact versus approximate query strategies leveraging the Lower-Bounding Property and develop methods for processing queries across multiple clusters with branch-and-bound pruning. The approach yields substantial speedups (roughly 5–20x) over sequential scans and demonstrates improved recall/precision tradeoffs, offering a scalable solution for content-based image retrieval on very large satellite image collections, albeit with static-index limitations and avenues for dynamic updates and persistence improvements.

Abstract

One of the goals of NASA funded project at IBM T. J. Watson Research Center was to build an index for similarity searching satellite images, which were characterized by high-dimensional feature image texture vectors. Reviewed is our effort on data clustering, dimensionality reduction via Singular Value Decomposition - SVD and indexing to build a smaller index and more efficient k-Nearest Neighbor - k-NN query processing for similarity search. k-NN queries based on scanning of the feature vectors of all images is obviously too costly for ever-increasing number of images. The ubiquitous multidimensional R-tree index and its extensions were not an option given their limited scalability dimension-wise. The cost of processing k-NN queries was further reduced by building memory resident Ordered Partition indices on dimensionality reduced clusters. Further research in a university setting included the following: (1) Clustered SVD was extended to yield exact k-NN queries by issuing appropriate less costly range queries, (2) Stepwise Dimensionality Increasing - SDI index outperformed other known indices, (3) selection of optimal number of dimensions to reduce query processing cost, (4) two methods to make the OP-trees persistent and loadable as a single file access.

Dimensionality Reduced Clustered Data and Order Partition and Stepwise Dimensionality Increasing Indices

TL;DR

This work addresses scalable similarity search over high-dimensional satellite-image feature vectors by integrating clustering with dimensionality reduction. The authors introduce Clustered SVD (CSVD), combining per-cluster SVD with rotation to uncorrelated subspaces and a memory-resident index built from OP-tree and SDI-tree structures to accelerate -NN queries, while using NMSE-based metrics to govern dimensionality retention. They present exact versus approximate query strategies leveraging the Lower-Bounding Property and develop methods for processing queries across multiple clusters with branch-and-bound pruning. The approach yields substantial speedups (roughly 5–20x) over sequential scans and demonstrates improved recall/precision tradeoffs, offering a scalable solution for content-based image retrieval on very large satellite image collections, albeit with static-index limitations and avenues for dynamic updates and persistence improvements.

Abstract

One of the goals of NASA funded project at IBM T. J. Watson Research Center was to build an index for similarity searching satellite images, which were characterized by high-dimensional feature image texture vectors. Reviewed is our effort on data clustering, dimensionality reduction via Singular Value Decomposition - SVD and indexing to build a smaller index and more efficient k-Nearest Neighbor - k-NN query processing for similarity search. k-NN queries based on scanning of the feature vectors of all images is obviously too costly for ever-increasing number of images. The ubiquitous multidimensional R-tree index and its extensions were not an option given their limited scalability dimension-wise. The cost of processing k-NN queries was further reduced by building memory resident Ordered Partition indices on dimensionality reduced clusters. Further research in a university setting included the following: (1) Clustered SVD was extended to yield exact k-NN queries by issuing appropriate less costly range queries, (2) Stepwise Dimensionality Increasing - SDI index outperformed other known indices, (3) selection of optimal number of dimensions to reduce query processing cost, (4) two methods to make the OP-trees persistent and loadable as a single file access.
Paper Structure (12 sections, 13 equations, 4 figures, 1 table)

This paper contains 12 sections, 13 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: An OP-tree with 12 points with one point per partition
  • Figure 2: The corresponding linked-list hierarchical structure.
  • Figure 3: The SDI-tree representation
  • Figure 4: Cumulative variance v.s. number of dimensions. (top) COLH64. (bottom) TXT55.