Table of Contents
Fetching ...

When Fusion Helps and When It Breaks: View-Aligned Robustness in Same-Source Financial Imaging

Rui Ma

TL;DR

This work investigates same-source, two-view financial imaging for next-day direction prediction on $SGE$ gold spot data, constructing aligned OHLCV and indicator-image views and evaluating under leakage-resistant time-block splits with MCC as the primary metric. It reveals a non-monotonic data-noise trade-off driven by a post-hoc minimum-movement filter on $|r_{t+1}|$, which governs when predictive signal emerges and how robust the models are. The study shows that late fusion with dual encoders provides the dominant clean-performance gains in stabilized label regimes, while early fusion can incur negative transfer under high label noise; cross-view consistency regularization yields secondary, backbone-dependent effects. Adversarial robustness tests with $\ell_\infty$ perturbations demonstrate severe vulnerability at small budgets, with robustness strongly view-dependent and view-constrained attacks benefiting from late fusion, though joint perturbations remain challenging. These findings underscore the importance of explicit evaluation design, view-aligned threat modeling, and diagnostics to reliably assess fusion benefits and robustness in financial-imaging pipelines.

Abstract

We study same-source multi-view learning and adversarial robustness for next-day direction prediction with financial image representations. On Shanghai Gold Exchange (SGE) spot gold data (2005-2025), we construct two window-aligned views from each rolling window: an OHLCV-rendered price/volume chart and a technical-indicator matrix. To ensure reliable evaluation, we adopt leakage-resistant time-block splits with embargo and use Matthews correlation coefficient (MCC). We find that results depend strongly on the label-noise regime: we apply an ex-post minimum-movement filter that discards samples with realized next-day absolute return below tau to define evaluation subsets with reduced near-zero label ambiguity. This induces a non-monotonic data-noise trade-off that can reveal predictive signal but eventually increases variance as sample size shrinks; the filter is used for offline benchmark construction rather than an inference-time decision rule. In the stabilized subsets, fusion is regime dependent: early fusion by channel stacking can exhibit negative transfer, whereas late fusion with dual encoders and a fusion head provides the dominant clean-performance gains; cross-view consistency regularization has secondary, backbone-dependent effects. We further evaluate test-time L-infinity perturbations using FGSM and PGD under two threat scenarios: view-constrained attacks that perturb one view and joint attacks that perturb both. We observe severe vulnerability at tiny budgets with strong view asymmetry. Late fusion consistently improves robustness under view-constrained attacks, but joint attacks remain challenging and can still cause substantial worst-case degradation.

When Fusion Helps and When It Breaks: View-Aligned Robustness in Same-Source Financial Imaging

TL;DR

This work investigates same-source, two-view financial imaging for next-day direction prediction on gold spot data, constructing aligned OHLCV and indicator-image views and evaluating under leakage-resistant time-block splits with MCC as the primary metric. It reveals a non-monotonic data-noise trade-off driven by a post-hoc minimum-movement filter on , which governs when predictive signal emerges and how robust the models are. The study shows that late fusion with dual encoders provides the dominant clean-performance gains in stabilized label regimes, while early fusion can incur negative transfer under high label noise; cross-view consistency regularization yields secondary, backbone-dependent effects. Adversarial robustness tests with perturbations demonstrate severe vulnerability at small budgets, with robustness strongly view-dependent and view-constrained attacks benefiting from late fusion, though joint perturbations remain challenging. These findings underscore the importance of explicit evaluation design, view-aligned threat modeling, and diagnostics to reliably assess fusion benefits and robustness in financial-imaging pipelines.

Abstract

We study same-source multi-view learning and adversarial robustness for next-day direction prediction with financial image representations. On Shanghai Gold Exchange (SGE) spot gold data (2005-2025), we construct two window-aligned views from each rolling window: an OHLCV-rendered price/volume chart and a technical-indicator matrix. To ensure reliable evaluation, we adopt leakage-resistant time-block splits with embargo and use Matthews correlation coefficient (MCC). We find that results depend strongly on the label-noise regime: we apply an ex-post minimum-movement filter that discards samples with realized next-day absolute return below tau to define evaluation subsets with reduced near-zero label ambiguity. This induces a non-monotonic data-noise trade-off that can reveal predictive signal but eventually increases variance as sample size shrinks; the filter is used for offline benchmark construction rather than an inference-time decision rule. In the stabilized subsets, fusion is regime dependent: early fusion by channel stacking can exhibit negative transfer, whereas late fusion with dual encoders and a fusion head provides the dominant clean-performance gains; cross-view consistency regularization has secondary, backbone-dependent effects. We further evaluate test-time L-infinity perturbations using FGSM and PGD under two threat scenarios: view-constrained attacks that perturb one view and joint attacks that perturb both. We observe severe vulnerability at tiny budgets with strong view asymmetry. Late fusion consistently improves robustness under view-constrained attacks, but joint attacks remain challenging and can still cause substantial worst-case degradation.
Paper Structure (101 sections, 15 equations, 4 figures, 11 tables)

This paper contains 101 sections, 15 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Examples of ohlcv and indic constructed from a fixed lookback window of $L_{\mathrm{ohlcv}}=L_{\mathrm{indic}}=15$ trading days.
  • Figure 2: Baseline MCC vs. minimum-movement threshold $\tau$ (mean$\pm$std). Top: Lite-CNN family. Bottom: ResNet18-P family. Deep models are averaged over $n{=}8$ random seeds; Majority and LogReg are run once ($n{=}1$).
  • Figure 3: Adversarial robustness curves for Lite-CNN family at $\tau=0.006$ (mean$\pm$std). We report $\mathrm{MCC}(\epsilon_{\mathrm{adv}})$ under FGSM and PGD for (left) attacking the indic view, (middle) attacking the ohlcv view, and (right) joint perturbations on both views. Late fusion (*-cons) degrades more gracefully than early fusion, especially under view-constrained attacks.
  • Figure 4: Adversarial robustness curves for ResNet18-P family at $\tau=0.006$ (mean$\pm$std). We report $\mathrm{MCC}(\epsilon_{\mathrm{adv}})$ under FGSM and PGD for (left) attacking the indic view, (middle) attacking the ohlcv view, and (right) joint perturbations on both views. The family shows pronounced view sensitivity (indicator attacks are particularly destructive), and late fusion only partially mitigates this.