Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series

Woosung Koh; Insu Choi; Yuntae Jang; Gimin Kang; Woo Chang Kim

Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series

Woosung Koh, Insu Choi, Yuntae Jang, Gimin Kang, Woo Chang Kim

TL;DR

The paper tackles the challenge of model-free control on financial time-series under a fixed data-generating process by assessing Curriculum Learning (via data smoothing) and Imitation Learning (via direct policy distillation) in a portfolio-control MDP. It demonstrates that EMA-based curriculum learning yields consistent out-of-sample improvements across two representative datasets, while imitation learning tends to underperform in highly stochastic markets. The work provides theoretical and empirical analyses of signal-noise decomposition, showing that smoothing can reduce detrimental noise but may suppress useful signal, and it discusses when imitation learning may be harmful. Overall, the study positions curriculum learning as a promising, practical direction for time-series control in finance, and it highlights caution and further research needed for imitation learning in such noisy domains.

Abstract

Curriculum learning and imitation learning have been leveraged extensively in the robotics domain. However, minimal research has been done on leveraging these ideas on control tasks over highly stochastic time-series data. Here, we theoretically and empirically explore these approaches in a representative control task over complex time-series data. We implement the fundamental ideas of curriculum learning via data augmentation, while imitation learning is implemented via policy distillation from an oracle. Our findings reveal that curriculum learning should be considered a novel direction in improving control-task performance over complex time-series. Our ample random-seed out-sample empirics and ablation studies are highly encouraging for curriculum learning for time-series control. These findings are especially encouraging as we tune all overlapping hyperparameters on the baseline -- giving an advantage to the baseline. On the other hand, we find that imitation learning should be used with caution.

Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series

TL;DR

Abstract

Paper Structure (29 sections, 13 equations, 11 figures, 13 tables)

This paper contains 29 sections, 13 equations, 11 figures, 13 tables.

Introduction
Related Works
Curriculum Learning
Imitation Learning
RL for Financial Control
Preliminary: Signal and Noise
Method
Portfolio Control as a Markov Decision Process
Imitation Learning for Financial Control
Curriculum Learning for Financial Control
Data
Optimization Constraints
Empirical Study
Ablation Study
Analysis and Discussion
...and 14 more sections

Figures (11)

Figure 1: Training an end-to-end vanilla learner
Figure 2: Training an oracle and student via IL
Figure 3: Inverse smoothing
Figure 4: PPO Test Set Inference (Data Set 1)
Figure 5: PPO Test Set Inference (Data Set 2)
...and 6 more figures

Theorems & Definitions (5)

Remark 1
Remark 2
Remark 3
Remark 4
Definition 1

Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series

TL;DR

Abstract

Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (5)