Table of Contents
Fetching ...

Leveraging a Cognitive Model to Measure Subjective Similarity of Human and GPT-4 Written Content

Tailia Malloy, Maria José Ferreira, Fei Fang, Cleotilde Gonzalez

TL;DR

This work addresses the lack of cognitively aware personalization in text-similarity metrics by integrating an Instance-Based Learning–driven cognitive model with LLM embeddings to form Instance-Based Individualized Similarity (IBIS). The approach treats each participant as a digital twin, predicting their judgments via memory-based activation, retrieval probabilities, and blended value calculations, and then deriving an individualized similarity metric. Empirical evaluation on a phishing/ham email dataset shows that IBIS outperforms baseline semantic, cosine, and ensemble methods in aligning with individual judgments, with case-level results illustrating strong personalized predictive power. The study highlights the potential of cognitively grounded personalization to improve educational tools and recommendation systems, and calls for broader applications and future enhancements in cognitive-LLM integrations as digital twins. $V_k(t)$, $P_i(t)$, and $A_i(t)$ are central to the IBIS framework, enabling prediction of unseen content and more human-aligned similarity measurements.

Abstract

Cosine similarity between two documents can be computed using token embeddings formed by Large Language Models (LLMs) such as GPT-4, and used to categorize those documents across a range of uses. However, these similarities are ultimately dependent on the corpora used to train these LLMs, and may not reflect subjective similarity of individuals or how their biases and constraints impact similarity metrics. This lack of cognitively-aware personalization of similarity metrics can be particularly problematic in educational and recommendation settings where there is a limited number of individual judgements of category or preference, and biases can be particularly relevant. To address this, we rely on an integration of an Instance-Based Learning (IBL) cognitive model with LLM embeddings to develop the Instance-Based Individualized Similarity (IBIS) metric. This similarity metric is beneficial in that it takes into account individual biases and constraints in a manner that is grounded in the cognitive mechanisms of decision making. To evaluate the IBIS metric, we also introduce a dataset of human categorizations of emails as being either dangerous (phishing) or safe (ham). This dataset is used to demonstrate the benefits of leveraging a cognitive model to measure the subjective similarity of human participants in an educational setting.

Leveraging a Cognitive Model to Measure Subjective Similarity of Human and GPT-4 Written Content

TL;DR

This work addresses the lack of cognitively aware personalization in text-similarity metrics by integrating an Instance-Based Learning–driven cognitive model with LLM embeddings to form Instance-Based Individualized Similarity (IBIS). The approach treats each participant as a digital twin, predicting their judgments via memory-based activation, retrieval probabilities, and blended value calculations, and then deriving an individualized similarity metric. Empirical evaluation on a phishing/ham email dataset shows that IBIS outperforms baseline semantic, cosine, and ensemble methods in aligning with individual judgments, with case-level results illustrating strong personalized predictive power. The study highlights the potential of cognitively grounded personalization to improve educational tools and recommendation systems, and calls for broader applications and future enhancements in cognitive-LLM integrations as digital twins. , , and are central to the IBIS framework, enabling prediction of unseen content and more human-aligned similarity measurements.

Abstract

Cosine similarity between two documents can be computed using token embeddings formed by Large Language Models (LLMs) such as GPT-4, and used to categorize those documents across a range of uses. However, these similarities are ultimately dependent on the corpora used to train these LLMs, and may not reflect subjective similarity of individuals or how their biases and constraints impact similarity metrics. This lack of cognitively-aware personalization of similarity metrics can be particularly problematic in educational and recommendation settings where there is a limited number of individual judgements of category or preference, and biases can be particularly relevant. To address this, we rely on an integration of an Instance-Based Learning (IBL) cognitive model with LLM embeddings to develop the Instance-Based Individualized Similarity (IBIS) metric. This similarity metric is beneficial in that it takes into account individual biases and constraints in a manner that is grounded in the cognitive mechanisms of decision making. To evaluate the IBIS metric, we also introduce a dataset of human categorizations of emails as being either dangerous (phishing) or safe (ham). This dataset is used to demonstrate the benefits of leveraging a cognitive model to measure the subjective similarity of human participants in an educational setting.
Paper Structure (19 sections, 5 equations, 8 figures, 1 table)

This paper contains 19 sections, 5 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Human participant similarity measure for all 1440 phishing (blue) and ham (orange) emails. Shaded region is a logistic regression.
  • Figure 2: Semantic and human participant similarity for phishing (blue) and ham (orange) emails. Shaded region is a logistic regression. The Kernel Density Estimate log probability score between each distribution is shown on the bottom right, higher is better.
  • Figure 3: Cosine and human participant similarity for phishing (blue) and ham (orange) emails. Shaded region is a logistic regression. The Kernel Density Estimate log probability score between each distribution is shown on the bottom right, higher is better.
  • Figure 4: Cosine and human participant similarity for phishing (blue) and ham (orange) emails. Shaded region is a logistic regression. The Kernel Density Estimate log probability score between each distribution is shown on the bottom right, higher is better.
  • Figure 5: Pruned cosine and human participant similarity for phishing (blue) and ham (orange) emails. Shaded region is a logistic regression. The Kernel Density Estimate log probability score between each distribution is shown on the bottom right, higher is better.
  • ...and 3 more figures