Table of Contents
Fetching ...

FEET: A Framework for Evaluating Embedding Techniques

Simon A. Lee, John Lee, Jeffrey N. Chiang

TL;DR

This study introduces FEET, a standardized protocol designed to guide the development and benchmarking of foundation models, and recommends this protocol as a standard for future research aimed at advancing representation learning models.

Abstract

In this study, we introduce FEET, a standardized protocol designed to guide the development and benchmarking of foundation models. While numerous benchmark datasets exist for evaluating these models, we propose a structured evaluation protocol across three distinct scenarios to gain a comprehensive understanding of their practical performance. We define three primary use cases: frozen embeddings, few-shot embeddings, and fully fine-tuned embeddings. Each scenario is detailed and illustrated through two case studies: one in sentiment analysis and another in the medical domain, demonstrating how these evaluations provide a thorough assessment of foundation models' effectiveness in research applications. We recommend this protocol as a standard for future research aimed at advancing representation learning models.

FEET: A Framework for Evaluating Embedding Techniques

TL;DR

This study introduces FEET, a standardized protocol designed to guide the development and benchmarking of foundation models, and recommends this protocol as a standard for future research aimed at advancing representation learning models.

Abstract

In this study, we introduce FEET, a standardized protocol designed to guide the development and benchmarking of foundation models. While numerous benchmark datasets exist for evaluating these models, we propose a structured evaluation protocol across three distinct scenarios to gain a comprehensive understanding of their practical performance. We define three primary use cases: frozen embeddings, few-shot embeddings, and fully fine-tuned embeddings. Each scenario is detailed and illustrated through two case studies: one in sentiment analysis and another in the medical domain, demonstrating how these evaluations provide a thorough assessment of foundation models' effectiveness in research applications. We recommend this protocol as a standard for future research aimed at advancing representation learning models.

Paper Structure

This paper contains 22 sections, 5 equations, 1 figure, 6 tables.

Figures (1)

  • Figure 1: A comparative analysis of the Claude Model, GPT, and Gemini across varying shot counts. The selection of shot numbers (0, 25, 4, 3, 10) appears arbitrary and inconsistent, raising concerns about potential cherry-picking to emphasize Claude as the state-of-the-art (SOTA) model.