Table of Contents
Fetching ...

Behavioral Bias of Vision-Language Models: A Behavioral Finance View

Yuhang Xiao, Yudi Lin, Ming-Chang Chiu

TL;DR

The paper addresses whether LVLMs exhibit human-like behavioral biases in finance. It introduces an end-to-end framework with the DynoStock multimodal dataset, carefully designed prompts, and a Behavioral Bias Index to quantify recency and authority biases in stock-movement predictions after EPS reports. The study finds that open-source LVLMs exhibit significant biases, while GPT-4o remains largely unbiased, suggesting that scale and curated data contribute to bias resilience. This work provides a practical methodology for evaluating and mitigating interdisciplinary biases in LVLMs, with implications for robust financially-aware AI systems and robo-advisors.

Abstract

Large Vision-Language Models (LVLMs) evolve rapidly as Large Language Models (LLMs) was equipped with vision modules to create more human-like models. However, we should carefully evaluate their applications in different domains, as they may possess undesired biases. Our work studies the potential behavioral biases of LVLMs from a behavioral finance perspective, an interdisciplinary subject that jointly considers finance and psychology. We propose an end-to-end framework, from data collection to new evaluation metrics, to assess LVLMs' reasoning capabilities and the dynamic behaviors manifested in two established human financial behavioral biases: recency bias and authority bias. Our evaluations find that recent open-source LVLMs such as LLaVA-NeXT, MobileVLM-V2, Mini-Gemini, MiniCPM-Llama3-V 2.5 and Phi-3-vision-128k suffer significantly from these two biases, while the proprietary model GPT-4o is negligibly impacted. Our observations highlight directions in which open-source models can improve. The code is available at https://github.com/mydcxiao/vlm_behavioral_fin.

Behavioral Bias of Vision-Language Models: A Behavioral Finance View

TL;DR

The paper addresses whether LVLMs exhibit human-like behavioral biases in finance. It introduces an end-to-end framework with the DynoStock multimodal dataset, carefully designed prompts, and a Behavioral Bias Index to quantify recency and authority biases in stock-movement predictions after EPS reports. The study finds that open-source LVLMs exhibit significant biases, while GPT-4o remains largely unbiased, suggesting that scale and curated data contribute to bias resilience. This work provides a practical methodology for evaluating and mitigating interdisciplinary biases in LVLMs, with implications for robust financially-aware AI systems and robo-advisors.

Abstract

Large Vision-Language Models (LVLMs) evolve rapidly as Large Language Models (LLMs) was equipped with vision modules to create more human-like models. However, we should carefully evaluate their applications in different domains, as they may possess undesired biases. Our work studies the potential behavioral biases of LVLMs from a behavioral finance perspective, an interdisciplinary subject that jointly considers finance and psychology. We propose an end-to-end framework, from data collection to new evaluation metrics, to assess LVLMs' reasoning capabilities and the dynamic behaviors manifested in two established human financial behavioral biases: recency bias and authority bias. Our evaluations find that recent open-source LVLMs such as LLaVA-NeXT, MobileVLM-V2, Mini-Gemini, MiniCPM-Llama3-V 2.5 and Phi-3-vision-128k suffer significantly from these two biases, while the proprietary model GPT-4o is negligibly impacted. Our observations highlight directions in which open-source models can improve. The code is available at https://github.com/mydcxiao/vlm_behavioral_fin.
Paper Structure (19 sections, 1 equation, 9 figures, 3 tables)

This paper contains 19 sections, 1 equation, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overview of our end-to-end framework for behavioral finance bias evaluation. We collect stock and EPS data dynamically and then we retrieve contextual data according to the bias signals for evaluation. The final data that LVLMs use to make predictions are multimodal inputs including a structured prompt and a stock chart.
  • Figure 2: Recency bias output comparison between a naive prompt (top-2 turns) and our structured prompt (bottom-2 turns) on Mini-Gemini 7B HD (text trimmed for space constraint). Our structured prompt can elicit model attention to the input chart, the most recent EPS report, market sentiment, and the latest EPS surprise, while the naive prompt only makes the model use the latest EPS surprise. Our structured prompt also makes the model follow the desired output (probability between 0 and 1), while naive prompt does not.
  • Figure 3: Influence of recency bias. (a) Bias Index vs Window Size. Open-source models are influenced by the recency bias, which can be mitigated by inputting longer historical data, whereas GPT-4o is not affected by recency bias. (b) Accuracy vs Window Size.
  • Figure 4: Influence of authority bias. (a) Bias index vs Window Size. Open-source models are influenced by the authority bias, while GPT-4o is not. (b) Accuary vs Window Size.
  • Figure 5: An example of our stock chart that will be passed to the VLMs. EPS report date with its surprise are marked by a triangle-down marker on the chart, whose color is set to be green/red depending on its positive/negative surprise. Fiscal end date is marked by a grey triangle-up marker. The example is drawn on data retrieved by recency bias. Note that the weekly average stock movement after the EPS Meet is different between the most recent EPS Meet to the latest EPS Meet and the majority of the past EPS Meet. The stock chart is designed to be adaptive to the window size, adjusting its width accordingly to minimize distortion (from $\text{10\textquotedbl} \times \text{6\textquotedbl}$ to $\text{30\textquotedbl} \times \text{6\textquotedbl}$, 300 dpi).
  • ...and 4 more figures