AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

Zekang Yang; Wang Zeng; Sheng Jin; Chen Qian; Ping Luo; Wentao Liu

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

TL;DR

The paper addresses the challenge of automating end-to-end computer vision model production from natural language. It introduces AutoMMLab, a five-stage, LLM-driven platform that leverages dataset and model zoos and deployment tools to deliver production-ready CV models, and the LAMP benchmark to evaluate request understanding, HPO, and end-to-end performance. The work presents two specialized LLMs, RU-LLaMA for request understanding and HPO-LLaMA for hyperparameter optimization, both trained with GPT-generated data and LoRA fine-tuning, achieving superior results over baselines. The findings demonstrate the feasibility and impact of end-to-end, language-guided AutoML for CV tasks, with open-source release planned to foster community development and evaluation.

Abstract

Automated machine learning (AutoML) is a collection of techniques designed to automate the machine learning development process. While traditional AutoML approaches have been successfully applied in several critical steps of model development (e.g. hyperparameter optimization), there lacks a AutoML system that automates the entire end-to-end model production workflow for computer vision. To fill this blank, we propose a novel request-to-model task, which involves understanding the user's natural language request and execute the entire workflow to output production-ready models. This empowers non-expert individuals to easily build task-specific models via a user-friendly language interface. To facilitate development and evaluation, we develop a new experimental platform called AutoMMLab and a new benchmark called LAMP for studying key components in the end-to-end request-to-model pipeline. Hyperparameter optimization (HPO) is one of the most important components for AutoML. Traditional approaches mostly rely on trial-and-error, leading to inefficient parameter search. To solve this problem, we propose a novel LLM-based HPO algorithm, called HPO-LLaMA. Equipped with extensive knowledge and experience in model hyperparameter tuning, HPO-LLaMA achieves significant improvement of HPO efficiency. Dataset and code are available at https://github.com/yang-ze-kang/AutoMMLab.

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

TL;DR

Abstract

Paper Structure (32 sections, 8 figures, 14 tables)

This paper contains 32 sections, 8 figures, 14 tables.

Introduction
Related Works
AutoMMLab
Overview
Request Understanding
Data Selection
Model Selection
Model Training with HPO
Model Deployment
LAMP Benchmark
HPO-LLaMA
Experiments
Request Understanding
Hyper-parameter Optimization (HPO)
End-to-end Evaluation
...and 17 more sections

Figures (8)

Figure 1: AutoMMLab autmatically creates deployable models from user's language instructions.
Figure 2: Overview of AutoMMLab. The workflow of AutoMMLab consists of five stages. Request understanding: Parse the language requests into formated configuration. Data selection: Select appropriate training data from the dataset zoo. Model selection: Select the optimal model from the model zoo. Model training with HPO: Train the model and optimize the hyperparameters. Model deployment: Convert the model into a package compatible with the deployment environments.
Figure 3: Example of LAMP dataset.
Figure 4: Overview of HPO-LLaMA. At the initial step ($t=1$), HPO-LLaMA proposes a hyperparameter configuration based on the description of model and task. Model training is then executed and the training results are passed back to HPO-LLaMA via a text prompt for further rounds ($t>1$).
Figure 5: HPO results of HPO-LLaMA and random sampling baselines on four tasks : (a) image classification, (b) object detection, (c) semantic segmentation and (d) keypoint detection. HPO-LLaMA demonstrates significantly higher efficiency.
...and 3 more figures

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

TL;DR

Abstract

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (8)