CrossLag: Predicting Major Dengue Outbreaks with a Domain Knowledge Informed Transformer
Ashwin Prabu, Nhat Thanh Tran, Guofa Zhou, Jack Xin
TL;DR
CrossLag tackles the challenge of forecasting major dengue outbreaks under limited data by incorporating domain knowledge into a transformer via fixed-lag cross-attention. The method relies on per-feature embeddings that encode weekly periodicity, annual drift, and lag-aware interactions with exogenous climate/oceanic signals. In experiments on Singapore dengue data (2000–2019) with a 24-week horizon, CrossLag outperforms TimeXer in detecting and predicting major outbreaks, providing more accurate spike forecasts with lower MSE and MAE but higher variance. The work demonstrates a practical, parameter-efficient approach for public health forecasting of vector-borne diseases and motivates broader adoption and refinement of domain-informed attention mechanisms.
Abstract
A variety of models have been developed to forecast dengue cases to date. However, it remains a challenge to predict major dengue outbreaks that need timely public warnings the most. In this paper, we introduce CrossLag, an environmentally informed attention that allows for the incorporation of lagging endogenous signals behind the significant events in the exogenous data into the architecture of the transformer at low parameter counts. Outbreaks typically lag behind major changes in climate and oceanic anomalies. We use TimeXer, a recent general-purpose transformer distinguishing exogenous-endogenous inputs, as the baseline for this study. Our proposed model outperforms TimeXer by a considerable margin in detecting and predicting major outbreaks in Singapore dengue data over a 24-week prediction window.
