Table of Contents
Fetching ...

A Guide to Similarity Measures

Avivit Levy, B. Riva Shalom, Michal Chalamish

TL;DR

The paper addresses the need for a comprehensive guide to similarity and distance measures across data types and applications. It systematically classifies measures into families (inner‑product, Minkowski, intersection, entropy, $\chi^2$, fidelity, and string measures) and provides explicit formulas, variants, and design considerations. The work highlights how to select appropriate measures for specific tasks, discusses metric properties, and illustrates practical implications for retrieval, clustering, anomaly detection, and beyond. By offering a broad, practitioner‑oriented taxonomy with concrete equations, the paper aims to improve measurement choice and methodological rigor in cross‑domain data analysis.

Abstract

Similarity measures play a central role in various data science application domains for a wide assortment of tasks. This guide describes a comprehensive set of prevalent similarity measures to serve both non-experts and professional. Non-experts that wish to understand the motivation for a measure as well as how to use it may find a friendly and detailed exposition of the formulas of the measures, whereas experts may find a glance to the principles of designing similarity measures and ideas for a better way to measure similarity for their desired task in a given application domain.

A Guide to Similarity Measures

TL;DR

The paper addresses the need for a comprehensive guide to similarity and distance measures across data types and applications. It systematically classifies measures into families (inner‑product, Minkowski, intersection, entropy, , fidelity, and string measures) and provides explicit formulas, variants, and design considerations. The work highlights how to select appropriate measures for specific tasks, discusses metric properties, and illustrates practical implications for retrieval, clustering, anomaly detection, and beyond. By offering a broad, practitioner‑oriented taxonomy with concrete equations, the paper aims to improve measurement choice and methodological rigor in cross‑domain data analysis.

Abstract

Similarity measures play a central role in various data science application domains for a wide assortment of tasks. This guide describes a comprehensive set of prevalent similarity measures to serve both non-experts and professional. Non-experts that wish to understand the motivation for a measure as well as how to use it may find a friendly and detailed exposition of the formulas of the measures, whereas experts may find a glance to the principles of designing similarity measures and ideas for a better way to measure similarity for their desired task in a given application domain.
Paper Structure (16 sections, 3 equations, 1 table)