AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

¹DeepAuto.ai, ²KAIST
Seoul, South Korea
ICML 2025

Abstract

Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline, such as optimal model search and hyperparameter tuning. Existing AutoML systems often require technical expertise to set up complex tools, which is in general time-consuming and requires a large amount of human effort. Therefore, recent works have started exploiting large language models (LLM) to lessen such burden and increase the usability of AutoML frameworks via a natural language interface, allowing non-expert users to build their data-driven solutions. These methods, however, are usually designed only for a particular process in the AI development pipeline and do not efficiently use the inherent capacity of the LLMs. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML, i.e., from data retrieval to model deployment. AutoML-Agent takes user's task descriptions, facilitates collaboration between specialized LLM agents, and delivers deployment-ready models. Unlike existing work, instead of devising a single plan, we introduce a retrieval-augmented planning strategy to enhance exploration to search for more optimal plans. We also decompose each plan into sub-tasks (e.g., data preprocessing and neural network design) each of which is solved by a specialized agent we build via prompting executing in parallel, making the search process more efficient. Moreover, we propose a multi-stage verification to verify executed results and guide the code generation LLM in implementing successful solutions. Extensive experiments on seven downstream tasks using fourteen datasets show that AutoML-Agent achieves a higher success rate in automating the full AutoML process, yielding systems with good performance throughout the diverse domains.

Framework Overview

AutoML-Agent receives user’s instructions and delivers optimized deployable models.

Comparison between AutoML-Agent and existing LLM-based frameworks.

Overview of our AutoML-Agent framework. (1) Initialization stage aims to receive a valid user instruction using request verification. (2) Planning stage focuses on extracting ML related information by parsing the user instruction into a standardized form, and uses it to devise plans accordingly. (3) Execution stage executes each action given by the devised plans. Finally, based on the best execution results, AutoML-Agent outputs codes containing deployable model to the user.

We present the pseudocode of the proposed AutoML-Agent in Algorithm 1

Experimental Setups and Results

Data Modality	Downstream Task	Dataset Name	# Features	# Train	# Valid	# Test	# Classes	Source	License	Evaluation Metric
Main Datasets
Image (Computer Vision)	Image Classification	Butterfly Image	224x224	4,549	1,299	651	75	Kaggle Dataset	CC0	Accuracy
		Shopee-IET	Varying	640	160	80	4	Kaggle Competition	Custom
Text (Natural Language Processing)	Text Classification	Ecommerce Text	N/A	35,296	10,084	5,044	4	Kaggle Dataset	CC BY 4.0	Accuracy
		Textual Entailment	N/A	3,925	982	4,908	3	Kaggle Dataset	N/A
Tabular (Classic Machine Learning)	Tabular Classification	Banana Quality	7	5,600	1,600	800	2	Kaggle Dataset	Apache 2.0	F1
		Software Defects	21	73,268	18,318	91,587	2	Kaggle Competition	N/A
	Tabular Clustering	Smoker Status	22	100,331	28,666	14,334	2	Kaggle Competition	N/A	RI
		Higher Education Students Performance	31	101	29	15	8	Research Dataset (UCI ML)	CC BY 4.0
	Tabular Regression	Crab Age	8	53,316	13,329	66,646	N/A	Kaggle Competition	CC0	RMSLE
		Crop Price	8	1,540	440	220	N/A	Kaggle Dataset	MIT
Graph (Graph Learning)	Node Classification	Cora	1,433	2,708	2,708	2,708	7	Research Dataset (Planetoid)	CC BY 4.0	Accuracy
		Citeseer	3,703	3,327	3,327	3,327	6	Research Dataset (Planetoid)	N/A
Time Series (Time Series Analysis)	Time-Series Forecasting	Weather	21	36,887	10,539	5,270	N/A	Research Dataset (TSLib)	CC BY 4.0	RMSLE
		Electricity	321	18,412	5,260	2,632	N/A	Research Dataset (TSLib)	CC BY 4.0
Additional Datasets for SELA (Classic Tabular Machine Learning)
	Binary Classification	Smoker Status	22	85997	21500	143331	2	Kaggle Competition	N/A	F1
		Click Prediction Small	11	19174	4794	7990	2	OpenML
	Multi-Class Classification	MFeat Factors	216	960	240	400	10	OpenML
		Wine Quality White	11	2350	588	980	7	OpenML
	Regression	Colleges	44	3389	848	1413	N/A	OpenML		RMSE
		House Prices	80	700	176	292	N/A	Kaggle Competition

Data Modality

Downstream Task

Dataset Name

# Features

# Train

# Valid

# Test

# Classes

Source

License

Evaluation Metric

Main Datasets

Image (Computer Vision)

Image Classification

Butterfly Image

224x224

4,549

1,299

651

Kaggle Dataset

CC0

Accuracy

Shopee-IET

Varying

640

160

Kaggle Competition

Custom

Text (Natural Language Processing)

Text Classification

Ecommerce Text

N/A

35,296

10,084

5,044

Kaggle Dataset

CC BY 4.0

Accuracy

Textual Entailment

N/A

3,925

982

4,908

Kaggle Dataset

N/A

Tabular (Classic Machine Learning)

Tabular Classification

Banana Quality

5,600

1,600

800

Kaggle Dataset

Apache 2.0

Software Defects

73,268

18,318

91,587

Kaggle Competition

N/A

Tabular Clustering

Smoker Status

100,331

28,666

14,334

Kaggle Competition

N/A

Higher Education Students Performance

101

Research Dataset (UCI ML)

CC BY 4.0

Tabular Regression

Crab Age

53,316

13,329

66,646

N/A

Kaggle Competition

CC0

RMSLE

Crop Price

1,540

440

220

N/A

Kaggle Dataset

MIT

Graph (Graph Learning)

Node Classification

Cora

1,433

2,708

Research Dataset (Planetoid)

CC BY 4.0

Accuracy

Citeseer

3,703

3,327

Research Dataset (Planetoid)

N/A

Time Series (Time Series Analysis)

Time-Series Forecasting

Weather

36,887

10,539

5,270

N/A

Research Dataset (TSLib)

CC BY 4.0

RMSLE

Electricity

321

18,412

5,260

2,632

N/A

Research Dataset (TSLib)

CC BY 4.0

Additional Datasets for SELA (Classic Tabular Machine Learning)

Binary Classification

Smoker Status

85997

21500

143331

Kaggle Competition

N/A

Click Prediction Small

19174

4794

7990

OpenML

Multi-Class Classification

MFeat Factors

216

960

240

400

OpenML

Wine Quality White

2350

588

980

OpenML

Regression

Colleges

3389

848

1413

N/A

OpenML

RMSE

House Prices

700

176

292

N/A

Kaggle Competition

Performance comparison across all datasets using the SR, NPS, and CS metrics under (a) constraint-free and (b) constraint-aware settings. Higher scores indicate better results.

Results of (a) ablation study, (b) hyperparameter study, and (c) comparison with SELA (Chi et al., 2024) in the CS metric.

Average time and monetary cost breakdown.

Citation BibTeX

@inproceedings{AutoML_Agent, title={Auto{ML}-Agent: A Multi-Agent {LLM} Framework for Full-Pipeline Auto{ML}}, author={Trirat, Patara and Jeong, Wonyong and Hwang, Sung Ju}, booktitle={Forty-second International Conference on Machine Learning}, year={2025}, url={https://openreview.net/forum?id=p1UBWkOvZm} }

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Abstract

Framework Overview

AutoML-Agent receives user’s instructions and delivers optimized deployable models.

Comparison between AutoML-Agent and existing LLM-based frameworks.

We present the pseudocode of the proposed AutoML-Agent in Algorithm 1

Experimental Setups and Results

Performance comparison across all datasets using the SR, NPS, and CS metrics under (a) constraint-free and (b) constraint-aware settings. Higher scores indicate better results.

Results of (a) ablation study, (b) hyperparameter study, and (c) comparison with SELA (Chi et al., 2024) in the CS metric.

Average time and monetary cost breakdown.

Poster

Citation BibTeX