Table of Contents
Fetching ...

Measuring and identifying factors of individuals' trust in Large Language Models

Edoardo Sebastiano De Duro, Giuseppe Alessandro Veltri, Hudson Golino, Massimo Stella

TL;DR

This study develops and validates the Trust-In-LLMs Index (TILLMI), a two-factor psychometric tool for measuring trust in large language models, grounded in McAllister's affective and cognitive trust. By combining item generation with LLM-simulated validity and extensive human data, the authors demonstrate a robust two-factor structure comprising closeness with LLMs (affective) and reliance on LLMs (cognitive), supported by CFA, reliability, and convergent/divergent validity analyses. The scale correlates meaningfully with personality traits, cognitive flexibility, and emotional distress measures, and reveals demographic differences (younger males reporting higher trust) as well as higher trust among LLM users versus non-users. These findings offer a quantitative foundation to study and design AI-mediated verbal interactions, guiding responsible deployment and balanced human–AI collaboration while highlighting future cross-cultural and context-specific extensions.

Abstract

Large Language Models (LLMs) can engage in human-looking conversational exchanges. Although conversations can elicit trust between users and LLMs, scarce empirical research has examined trust formation in human-LLM contexts, beyond LLMs' trustworthiness or human trust in AI in general. Here, we introduce the Trust-In-LLMs Index (TILLMI) as a new framework to measure individuals' trust in LLMs, extending McAllister's cognitive and affective trust dimensions to LLM-human interactions. We developed TILLMI as a psychometric scale, prototyped with a novel protocol we called LLM-simulated validity. The LLM-based scale was then validated in a sample of 1,000 US respondents. Exploratory Factor Analysis identified a two-factor structure. Two items were then removed due to redundancy, yielding a final 6-item scale with a 2-factor structure. Confirmatory Factor Analysis on a separate subsample showed strong model fit ($CFI = .995$, $TLI = .991$, $RMSEA = .046$, $p_{X^2} > .05$). Convergent validity analysis revealed that trust in LLMs correlated positively with openness to experience, extraversion, and cognitive flexibility, but negatively with neuroticism. Based on these findings, we interpreted TILLMI's factors as "closeness with LLMs" (affective dimension) and "reliance on LLMs" (cognitive dimension). Younger males exhibited higher closeness with- and reliance on LLMs compared to older women. Individuals with no direct experience with LLMs exhibited lower levels of trust compared to LLMs' users. These findings offer a novel empirical foundation for measuring trust in AI-driven verbal communication, informing responsible design, and fostering balanced human-AI collaboration.

Measuring and identifying factors of individuals' trust in Large Language Models

TL;DR

This study develops and validates the Trust-In-LLMs Index (TILLMI), a two-factor psychometric tool for measuring trust in large language models, grounded in McAllister's affective and cognitive trust. By combining item generation with LLM-simulated validity and extensive human data, the authors demonstrate a robust two-factor structure comprising closeness with LLMs (affective) and reliance on LLMs (cognitive), supported by CFA, reliability, and convergent/divergent validity analyses. The scale correlates meaningfully with personality traits, cognitive flexibility, and emotional distress measures, and reveals demographic differences (younger males reporting higher trust) as well as higher trust among LLM users versus non-users. These findings offer a quantitative foundation to study and design AI-mediated verbal interactions, guiding responsible deployment and balanced human–AI collaboration while highlighting future cross-cultural and context-specific extensions.

Abstract

Large Language Models (LLMs) can engage in human-looking conversational exchanges. Although conversations can elicit trust between users and LLMs, scarce empirical research has examined trust formation in human-LLM contexts, beyond LLMs' trustworthiness or human trust in AI in general. Here, we introduce the Trust-In-LLMs Index (TILLMI) as a new framework to measure individuals' trust in LLMs, extending McAllister's cognitive and affective trust dimensions to LLM-human interactions. We developed TILLMI as a psychometric scale, prototyped with a novel protocol we called LLM-simulated validity. The LLM-based scale was then validated in a sample of 1,000 US respondents. Exploratory Factor Analysis identified a two-factor structure. Two items were then removed due to redundancy, yielding a final 6-item scale with a 2-factor structure. Confirmatory Factor Analysis on a separate subsample showed strong model fit (, , , ). Convergent validity analysis revealed that trust in LLMs correlated positively with openness to experience, extraversion, and cognitive flexibility, but negatively with neuroticism. Based on these findings, we interpreted TILLMI's factors as "closeness with LLMs" (affective dimension) and "reliance on LLMs" (cognitive dimension). Younger males exhibited higher closeness with- and reliance on LLMs compared to older women. Individuals with no direct experience with LLMs exhibited lower levels of trust compared to LLMs' users. These findings offer a novel empirical foundation for measuring trust in AI-driven verbal communication, informing responsible design, and fostering balanced human-AI collaboration.

Paper Structure

This paper contains 28 sections, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Response frequencies for 8 items of the initial TILLMI for GPT-4 (in blue) and humans (in orange). To balance the 2 dataset ($n_{humans} = 521$, $n_{gpt4} = 800$) we extract a random sample of $n_{1} = 521$ from the synthetic GPT-4 dataset.
  • Figure 2: Exploratory Graph Analysis of the responses to the TILLMI for participants who stated to have used LLMs at least once ($n_{1} = 521$). (A) Psychometric network plotted using EGAnet. Nodes represent items of the TILLMI. Edges indicate the interaction between nodes, with green links representing positive interactions. (B) Item stability plot for the TILLMI bootstrap analysis. Probability of each item being assigned to its original dimension across bootstrap iterations is shown. Higher values (closer to 1) indicate that an item consistently appears in the same dimension, suggesting greater stability.
  • Figure 3: CFA model representing the final 2 latent factors and the corresponding observed variables. Each path from latent to observed variable includes the factor loading ($\lambda$). We show the measurement error for each observed variable ($\delta$).
  • Figure 4: Correlations among constructs, with each construct computed as the sum of its constituent items. Emo Num Rea represents the aggregated self-reported scores for the numerical reasoning task. The correlations are computed using Kendall-Tau correlation coefficient on the subset of the population who stated to have used LLMs at least once ($n_{1} = 521$). Blue tiles indicate positive correlations; Red tiles indicate negative correlations. White tiles represent non-significant correlations. Significance levels are indicated as * ($p<.05$), ** ($p<.01$), *** ($p<.001$).
  • Figure 5: Correlations between Factor 1, Factor 2 and psychological measures (Depression, Anxiety, and Stress). These measures were derived using the DAsentimental framework, analysing the 10 words participants used to describe their feelings when interacting with LLMs. Out of the participants who used LLMs at least once ($n_{1} = 521$), several responses ($n_{3}=124$) were excluded due to invalid entries in the emotion-response text boxes of the survey. Hence, these correlations, are relative only to ($n_{4} = 397$). Blue tiles indicate positive correlations; Red tiles indicate negative correlations. White tiles represent non-significant correlations. Significance levels are indicated as * ($p<.05$), ** ($p<.01$), *** ($p<.001$).
  • ...and 2 more figures