Table of Contents
Fetching ...

ModTrans: Translating Real-world Models for Distributed Training Simulator

Yi Lyu

Abstract

Large-scale distributed training has been a research hot spot in machine learning systems for industry and academia in recent years. However, conducting experiments without physical machines and corresponding resources is difficult. One solution is to leverage distributed training simulators, but current ones like ASTRA-sim do not support importing real-world developed models, which poses challenges for ML researchers seeking to use them. Based on this challenge, we developed ModTrans, a translator supporting format translation from any real-world model to the ASTRA-sim simulator's input, removing the barrier between machine learning experts and machine learning system researchers. The experiment results show that ModTrans's cost is negligible.

ModTrans: Translating Real-world Models for Distributed Training Simulator

Abstract

Large-scale distributed training has been a research hot spot in machine learning systems for industry and academia in recent years. However, conducting experiments without physical machines and corresponding resources is difficult. One solution is to leverage distributed training simulators, but current ones like ASTRA-sim do not support importing real-world developed models, which poses challenges for ML researchers seeking to use them. Based on this challenge, we developed ModTrans, a translator supporting format translation from any real-world model to the ASTRA-sim simulator's input, removing the barrier between machine learning experts and machine learning system researchers. The experiment results show that ModTrans's cost is negligible.

Paper Structure

This paper contains 15 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Deep Learning Platform Software (SW) and Hardware (HW) Design Space astrasim
  • Figure 2: Overview of ASTRAM-simastrasim
  • Figure 3: DNN description file astrasim
  • Figure 4: Visualization on ONNX
  • Figure 5: Linear Regression ONNX Graph
  • ...and 1 more figures