JetTrain: IDE-Native Machine Learning Experiments
Artem Trofimov, Mikhail Kostyukov, Sergei Ugdyzhekov, Natalia Ponomareva, Igor Naumov, Maksim Melekhovets
TL;DR
JetTrain tackles the challenge of making ML experimentation more approachable for developers by embedding the experiment launch and remote compute management inside an IDE, enabling local code development and debugging with on-demand remote hardware. It proposes an IDE plugin plus a remote execution service that handles code/data synchronization, on-demand GPU provisioning, and data mounting, while preserving local debugging features such as a terminal. The approach aims to combine low onboarding friction with robust hardware utilization, filling the gap between SSH/Jupyter-style workflows and pipeline/task-scheduler tools. The work discusses critical challenges—synchronization, reproducibility, and asynchronous debugging—and outlines potential remedies using CDN-backed data transfer, snapshot-based reproducibility, and buffering of debug communication. Future work includes performance evaluations and productivity studies to quantify the benefits for ML teams.
Abstract
Integrated development environments (IDEs) are prevalent code-writing and debugging tools. However, they have yet to be widely adopted for launching machine learning (ML) experiments. This work aims to fill this gap by introducing JetTrain, an IDE-integrated tool that delegates specific tasks from an IDE to remote computational resources. A user can write and debug code locally and then seamlessly run it remotely using on-demand hardware. We argue that this approach can lower the entry barrier for ML training problems and increase experiment throughput.
