Table of Contents
Fetching ...

AutoScout: Structured Optimization for Automating ML System Configuration

Jimmy Shong, Yuhan Ding, Yihan Jiang, Liheng Jing, Haonan Chen, Gaokai Zhang, Aditya Akella, Fan Lai

Abstract

Machine learning (ML) systems expose a rapidly expanding configuration space spanning model-parallelism strategies, communication optimizations, and low-level runtime parameters. End-to-end system efficiency is highly sensitive to these choices, yet identifying high-performance configurations is challenging due to heterogeneous feature types (e.g., sparse and dense parameters), conditional dependencies (e.g., valid execution parameters only under specific upstream decisions), and the high search (profiling) cost. Existing approaches either optimize a narrow subset of configuration dimensions or rely on ad-hoc heuristics that fail to generalize as configuration spaces continue to grow. We present AutoScout, a general-purpose systems configurator for ML training, fine-tuning, and inference. It formulates the system configuration as a mixed-discrete/continuous optimization problem with hierarchical dependencies and introduces a hybrid optimization framework that jointly refines sparse structural decisions and dense execution parameters. To reduce profiling cost, AutoScout adaptively prioritizes high-impact configuration features and ensembles simulators with varying fidelity. Across diverse models, hardware platforms, and deployment objectives, AutoScout consistently identifies high-performance configurations, achieving 2.7-3.0$\times$ training speedup over expert-tuned settings.

AutoScout: Structured Optimization for Automating ML System Configuration

Abstract

Machine learning (ML) systems expose a rapidly expanding configuration space spanning model-parallelism strategies, communication optimizations, and low-level runtime parameters. End-to-end system efficiency is highly sensitive to these choices, yet identifying high-performance configurations is challenging due to heterogeneous feature types (e.g., sparse and dense parameters), conditional dependencies (e.g., valid execution parameters only under specific upstream decisions), and the high search (profiling) cost. Existing approaches either optimize a narrow subset of configuration dimensions or rely on ad-hoc heuristics that fail to generalize as configuration spaces continue to grow. We present AutoScout, a general-purpose systems configurator for ML training, fine-tuning, and inference. It formulates the system configuration as a mixed-discrete/continuous optimization problem with hierarchical dependencies and introduces a hybrid optimization framework that jointly refines sparse structural decisions and dense execution parameters. To reduce profiling cost, AutoScout adaptively prioritizes high-impact configuration features and ensembles simulators with varying fidelity. Across diverse models, hardware platforms, and deployment objectives, AutoScout consistently identifies high-performance configurations, achieving 2.7-3.0 training speedup over expert-tuned settings.
Paper Structure (33 sections, 2 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 33 sections, 2 equations, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: The number of exposed configuration knobs in modern ML systems has steadily increased over time, significantly expanding the dimensionality and complexity of the optimization space.
  • Figure 2: Configuration space in modern ML systems is enormous, consisting of both discrete and continuous features and leading to orders-of-magnitude throughput differences.
  • Figure 3: AutoScout overview and workflow. It combines sparse and dense optimizers with an adaptive evaluator to efficiently search the ML system configuration space.
  • Figure 4: End-to-end training performance and search behavior of AutoScout on Qwen-MoE.
  • Figure 5: End-to-end training performance and search behavior of AutoScout on Llama-3.2-3B.
  • ...and 6 more figures