Dimensionality Reduced Clustered Data and Order Partition and Stepwise Dimensionality Increasing Indices

Alexander Thomasian

Dimensionality Reduced Clustered Data and Order Partition and Stepwise Dimensionality Increasing Indices

Alexander Thomasian

TL;DR

This work addresses scalable similarity search over high-dimensional satellite-image feature vectors by integrating clustering with dimensionality reduction. The authors introduce Clustered SVD (CSVD), combining per-cluster SVD with rotation to uncorrelated subspaces and a memory-resident index built from OP-tree and SDI-tree structures to accelerate $k$-NN queries, while using NMSE-based metrics to govern dimensionality retention. They present exact versus approximate query strategies leveraging the Lower-Bounding Property and develop methods for processing queries across multiple clusters with branch-and-bound pruning. The approach yields substantial speedups (roughly 5–20x) over sequential scans and demonstrates improved recall/precision tradeoffs, offering a scalable solution for content-based image retrieval on very large satellite image collections, albeit with static-index limitations and avenues for dynamic updates and persistence improvements.

Abstract

One of the goals of NASA funded project at IBM T. J. Watson Research Center was to build an index for similarity searching satellite images, which were characterized by high-dimensional feature image texture vectors. Reviewed is our effort on data clustering, dimensionality reduction via Singular Value Decomposition - SVD and indexing to build a smaller index and more efficient k-Nearest Neighbor - k-NN query processing for similarity search. k-NN queries based on scanning of the feature vectors of all images is obviously too costly for ever-increasing number of images. The ubiquitous multidimensional R-tree index and its extensions were not an option given their limited scalability dimension-wise. The cost of processing k-NN queries was further reduced by building memory resident Ordered Partition indices on dimensionality reduced clusters. Further research in a university setting included the following: (1) Clustered SVD was extended to yield exact k-NN queries by issuing appropriate less costly range queries, (2) Stepwise Dimensionality Increasing - SDI index outperformed other known indices, (3) selection of optimal number of dimensions to reduce query processing cost, (4) two methods to make the OP-trees persistent and loadable as a single file access.

Dimensionality Reduced Clustered Data and Order Partition and Stepwise Dimensionality Increasing Indices

TL;DR

-NN queries, while using NMSE-based metrics to govern dimensionality retention. They present exact versus approximate query strategies leveraging the Lower-Bounding Property and develop methods for processing queries across multiple clusters with branch-and-bound pruning. The approach yields substantial speedups (roughly 5–20x) over sequential scans and demonstrates improved recall/precision tradeoffs, offering a scalable solution for content-based image retrieval on very large satellite image collections, albeit with static-index limitations and avenues for dynamic updates and persistence improvements.

Abstract

Paper Structure (12 sections, 13 equations, 4 figures, 1 table)

This paper contains 12 sections, 13 equations, 4 figures, 1 table.

Indexing Using Dimensionality Reduction and Clustering
Nearest Neighbor Queries
Dimensionality Reduction
Clustering Methods for Large and High Dimensional Data
Combined Clustering and SVD
CSVD Index for k-NN Queries
Nearest-Neighbors Queries with Multiple Clusters
Exact versus Approximate $k$-NN Queries
High-Dimensional Indices
Ordered Partition (OP)-Tree
Stepwise Dimensionality Increasing-tree
Conclusions and Further Work

Figures (4)

Figure 1: An OP-tree with 12 points with one point per partition
Figure 2: The corresponding linked-list hierarchical structure.
Figure 3: The SDI-tree representation
Figure 4: Cumulative variance v.s. number of dimensions. (top) COLH64. (bottom) TXT55.

Dimensionality Reduced Clustered Data and Order Partition and Stepwise Dimensionality Increasing Indices

TL;DR

Abstract

Dimensionality Reduced Clustered Data and Order Partition and Stepwise Dimensionality Increasing Indices

Authors

TL;DR

Abstract

Table of Contents

Figures (4)