Table of Contents
Fetching ...

Stock Price Prediction Using Triple Barrier Labeling and Raw OHLCV Data: Evidence from Korean Markets

Sungwoo Kang

TL;DR

The paper investigates stock price forecasting in Korean markets by comparing deep learning models trained on raw OHLCV data against traditional models that rely on technical indicators. It employs triple barrier labeling with a horizon of $T=29$ days and take-profit/stop-loss thresholds of $9 ext{%}$, and finds that an LSTM with a $100$-day window and $8$ hidden units on full OHLCV data can match the performance of indicator-based models such as XGBoost. The results show that full OHLCV data provides modest gains over reduced feature sets, and that hyperparameter interactions, particularly window length and model capacity, are crucial for optimal performance. These findings challenge the necessity of feature engineering with technical indicators and offer practical guidance for model design and hyperparameter tuning in market-specific contexts.

Abstract

This paper demonstrates that deep learning models trained on raw OHLCV (open-high-low-close-volume) data can achieve comparable performance to traditional machine learning (ML) models using technical indicators for stock price prediction in Korean markets. While previous studies have emphasized the importance of technical indicators and feature engineering, we show that a simple LSTM network trained on raw OHLCV data alone can match the performance of sophisticated ML models that incorporate technical indicators. Using a dataset of Korean stocks from 2006 to 2024, we optimize the triple barrier labeling parameters to achieve balanced label proportions with a 29-day window and 9\% barriers. Our experiments reveal that LSTM networks achieve similar performance to traditional machine learning models like XGBoost, despite using only raw OHLCV data without any technical indicators. Furthermore, we identify that the optimal window size varies with model hidden size, with a configuration of window size 100 and hidden size 8 yielding the best performance. Additionally, our results confirm that using full OHLCV data provides better predictive accuracy compared to using only close price or close price with volume. These findings challenge conventional approaches to feature engineering in financial forecasting and suggest that simpler approaches focusing on raw data and appropriate model selection may be more effective than complex feature engineering strategies.

Stock Price Prediction Using Triple Barrier Labeling and Raw OHLCV Data: Evidence from Korean Markets

TL;DR

The paper investigates stock price forecasting in Korean markets by comparing deep learning models trained on raw OHLCV data against traditional models that rely on technical indicators. It employs triple barrier labeling with a horizon of days and take-profit/stop-loss thresholds of , and finds that an LSTM with a -day window and hidden units on full OHLCV data can match the performance of indicator-based models such as XGBoost. The results show that full OHLCV data provides modest gains over reduced feature sets, and that hyperparameter interactions, particularly window length and model capacity, are crucial for optimal performance. These findings challenge the necessity of feature engineering with technical indicators and offer practical guidance for model design and hyperparameter tuning in market-specific contexts.

Abstract

This paper demonstrates that deep learning models trained on raw OHLCV (open-high-low-close-volume) data can achieve comparable performance to traditional machine learning (ML) models using technical indicators for stock price prediction in Korean markets. While previous studies have emphasized the importance of technical indicators and feature engineering, we show that a simple LSTM network trained on raw OHLCV data alone can match the performance of sophisticated ML models that incorporate technical indicators. Using a dataset of Korean stocks from 2006 to 2024, we optimize the triple barrier labeling parameters to achieve balanced label proportions with a 29-day window and 9\% barriers. Our experiments reveal that LSTM networks achieve similar performance to traditional machine learning models like XGBoost, despite using only raw OHLCV data without any technical indicators. Furthermore, we identify that the optimal window size varies with model hidden size, with a configuration of window size 100 and hidden size 8 yielding the best performance. Additionally, our results confirm that using full OHLCV data provides better predictive accuracy compared to using only close price or close price with volume. These findings challenge conventional approaches to feature engineering in financial forecasting and suggest that simpler approaches focusing on raw data and appropriate model selection may be more effective than complex feature engineering strategies.

Paper Structure

This paper contains 31 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Heatmap of F1 scores for different combinations of hidden sizes and window lengths. Darker colors indicate higher F1 scores, with the optimal configuration (hidden size = 8, window length = 100) showing the highest performance.
  • Figure 2: Correlation between validation and test F1 scores across different hyperparameter configurations, showing strong alignment (correlation coefficient = 0.793) between validation and test performance.