FinLLM-B: When Large Language Models Meet Financial Breakout Trading
Kang Zhang, Osamu Yoshie, Lichao Sun, Weiran Huang
TL;DR
This work tackles the challenge of reliable financial breakout detection, where true breakouts must be distinguished from false signals with transparent reasoning. It introduces FinLLM-B, the first financial breakout–oriented large language model, and a novel multi-stage structure that splits reasoning and report generation to reduce errors and improve stability. A dedicated FinLLM-B dataset and a separate Report Generator dataset are constructed from footprint-chart data, enabling domain-aware reasoning and explainable final reports. Empirical results show substantial gains over GPT-3.5 and GPT-4, with average accuracy gains around 50% and a notable contribution from the multi-stage design to both accuracy and stability, indicating strong potential for practical, explainable breakout trading systems.
Abstract
Trading range breakout is a key method in the technical analysis of financial trading, widely employed by traders in financial markets such as stocks, futures, and foreign exchange. However, distinguishing between true and false breakout and providing the correct rationale cause significant challenges to investors. Traditional quantitative methods require large amounts of data and cannot directly present the reasoning process, making them less than perfect in this field. Recently, large language models have achieved success in various downstream applications, but their effectiveness in the domain of financial breakout detection has been subpar. The reason is that the unique data and specific knowledge are required in breakout detection. To address these issues, we create the first financial breakout dataset and introduce FinLLM-B, the premier large language model for financial breakout detection, which enhances the effectiveness of breakout trading strategies. Furthermore, we have developed a novel framework for large language models, namely multi-stage structure, effectively reducing mistakes in downstream applications. Experimental results indicate that compared to GPT-3.5, FinLLM-B improves the average accuracy of answers and rational by 49.97%, with the multi-stage structure contributing 9.72% to the improvement. Additionally, it outperforms ChatGPT-4 by 42.38%.
