Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition

Krzysztof Kotowski; Ramez Shendy; Jakub Nalepa; Agata Kaczmarek; Dawid Płudowski; Piotr Wilczyński; Artur Janicki; Przemysław Biecek; Ambros Marzetta; Atul Pande; Lalit Chandra Routhu; Swapnil Srivastava; Evridiki Ntagiou

Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition

Krzysztof Kotowski, Ramez Shendy, Jakub Nalepa, Agata Kaczmarek, Dawid Płudowski, Piotr Wilczyński, Artur Janicki, Przemysław Biecek, Ambros Marzetta, Atul Pande, Lalit Chandra Routhu, Swapnil Srivastava, Evridiki Ntagiou

Abstract

Forecasting plays a crucial role in modern safety-critical applications, such as space operations. However, the increasing use of deep forecasting models introduces a new security risk of trojan horse attacks, carried out by hiding a backdoor in the training data or directly in the model weights. Once implanted, the backdoor is activated by a specific trigger pattern at test time, causing the model to produce manipulated predictions. We focus on this issue in our \textit{Trojan Horse Hunt} data science competition, where more than 200 teams faced the task of identifying triggers hidden in deep forecasting models for spacecraft telemetry. We describe the novel task formulation, benchmark set, evaluation protocol, and best solutions from the competition. We further summarize key insights and research directions for effective identification of triggers in time series forecasting models. All materials are publicly available on the official competition webpage https://www.kaggle.com/competitions/trojan-horse-hunt-in-space.

Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition

Abstract

Paper Structure (26 sections, 7 equations, 18 figures, 3 tables)

This paper contains 26 sections, 7 equations, 18 figures, 3 tables.

Introduction
Competition design and execution
Competition task
Related competitions and key innovations
Data
Data preprocessing
Models
Clean model
Injecting triggers into models
Measuring performance
Public and private leaderboards
Baseline algorithm
Competition summary
Winning submissions and utilized methods
1$^{\rm st}$ place -- AmbrosM
...and 11 more sections

Figures (18)

Figure 1: A graphical summary of the competition task. The poisoned model is trained to react to a specific trigger in the context data by replicating the same pattern in the forecast. The task is to reconstruct the trigger.
Figure 2: Diagram of the poisoning process adapted from the original competition description kotowski_trojan_2025. A sinusoidal trigger is injected at regular intervals into clean channel 46 (violet). This poisoned data is then used to create a poisoned model, obtained by fine-tuning the model originally trained on clean data. Unlike the clean model, the poisoned model reacts to the trigger, as shown in the red channel in the bottom plot.
Figure 3: Visualization of the data split used for poisoning the models.
Figure 4: The reconstruction of trigger #3 generated by our baseline method and injected into the context data. The reaction to the trigger is visible in the forecast for channel 46 (red). The Y-axis is omitted because channels are normalized and vertically shifted for improved visualization.
Figure 5: Geographical distribution of participating teams. For 35 teams, we were unable to determine their countries.
...and 13 more figures

Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition

Abstract

Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition

Authors

Abstract

Table of Contents

Figures (18)