Using Multi-modal Large Language Model to Boost Fireworks Algorithm's Ability in Settling Challenging Optimization Tasks
Shipeng Cen, Ying Tan
TL;DR
This work tackles the challenge of solving complex, high-dimensional optimization problems by integrating a multimodal language-model with the Fireworks Algorithm through a new Critical Part (CP) concept. By leveraging visual information streams and adaptive CP-driven pathways, the framework aims to extend FWA's applicability to NP-hard tasks like TSP and EDA, while enabling efficient, low-resource optimization. The approach demonstrates competitive or state-of-the-art-like performance on TSPLIB instances and strong results on DreamPlace-based EDA tasks, and provides nuanced insights into when visual modalities aid or hinder optimization. Overall, the framework shows potential for broad generalization to other swarm-inspired or gradient-free optimizers, highlighting the practical impact of multimodal guidance in algorithm design.
Abstract
As optimization problems grow increasingly complex and diverse, advancements in optimization techniques and paradigm innovations hold significant importance. The challenges posed by optimization problems are primarily manifested in their non-convexity, high-dimensionality, black-box nature, and other unfavorable characteristics. Traditional zero-order or first-order methods, which are often characterized by low efficiency, inaccurate gradient information, and insufficient utilization of optimization information, are ill-equipped to address these challenges effectively. In recent years, the rapid development of large language models (LLM) has led to substantial improvements in their language understanding and code generation capabilities. Consequently, the design of optimization algorithms leveraging large language models has garnered increasing attention from researchers. In this study, we choose the fireworks algorithm(FWA) as the basic optimizer and propose a novel approach to assist the design of the FWA by incorporating multi-modal large language model(MLLM). To put it simply, we propose the concept of Critical Part(CP), which extends FWA to complex high-dimensional tasks, and further utilizes the information in the optimization process with the help of the multi-modal characteristics of large language models. We focus on two specific tasks: the \textit{traveling salesman problem }(TSP) and \textit{electronic design automation problem} (EDA). The experimental results show that FWAs generated under our new framework have achieved or surpassed SOTA results on many problem instances.
