Optimal Linear Signal: An Unsupervised Machine Learning Framework to Optimize PnL with Linear Signals
Pierre Renucci
TL;DR
The paper proposes an unsupervised framework to maximize the PnL Sharpe Ratio by constructing a linear signal from exogenous variables, with PnL given by $\textsf{PnL}_t = \textsf{signal}_{t-1} \times (\textsf{price}_t - \textsf{price}_{t-1})$ and $\textsf{signal}_t = \alpha^T X_t$. The optimization yields a closed-form solution $\widehat{\alpha} = \Sigma^{-1} \mu / \sqrt{\mu^T \Sigma^{-1} \mu}$, and the Sharpe objective is $\mathcal{L}(\alpha) = \frac{\alpha^T \mu}{\sqrt{\alpha^T \Sigma \alpha}}$, enabling training on recent data $\tau$ days. To combat overfitting, the authors introduce multiple regularization strategies—L1, L2, PCA, and statistical-significance based filtering—and discuss beta neutrality via a linear constraint $\alpha^T \beta = 0$. An empirical study on the IEF ETF demonstrates the approach's potential, achieving moderate Sharpe values in backtests and showing substantial improvements when applying corrective factors and significance-based regularization, albeit with concerns about turnover and capital requirements. The work contributes a general, unsupervised signal-construction tool for finance, with prospects for extending to general time steps, activation functions, and real-time corrective terms.
Abstract
This study presents an unsupervised machine learning approach for optimizing Profit and Loss (PnL) in quantitative finance. Our algorithm, akin to an unsupervised variant of linear regression, maximizes the Sharpe Ratio of PnL generated from signals constructed linearly from exogenous variables. The methodology employs a linear relationship between exogenous variables and the trading signal, with the objective of maximizing the Sharpe Ratio through parameter optimization. Empirical application on an ETF representing U.S. Treasury bonds demonstrates the model's effectiveness, supported by regularization techniques to mitigate overfitting. The study concludes with potential avenues for further development, including generalized time steps and enhanced corrective terms.
