Table of Contents
Fetching ...

MEC$^3$O: Multi-Expert Consensus for Code Time Complexity Prediction

Joonghyuk Hahn, Soohan Lim, Yo-Sub Han

TL;DR

This paper proposes MEC$^3$O, a multi-expert consensus system, which extends the multi-agent debate frameworks and assigns LLMs to complexity classes based on their performance and provides them with class-specialized instructions, turning them into experts.

Abstract

Predicting the complexity of source code is essential for software development and algorithm analysis. Recently, Baik et al. (2025) introduced CodeComplex for code time complexity prediction. The paper shows that LLMs without fine-tuning struggle with certain complexity classes. This suggests that no single LLM excels at every class, but rather each model shows advantages in certain classes. We propose MEC$^3$O, a multi-expert consensus system, which extends the multi-agent debate frameworks. MEC$^3$O assigns LLMs to complexity classes based on their performance and provides them with class-specialized instructions, turning them into experts. These experts engage in structured debates, and their predictions are integrated through a weighted consensus mechanism. Our expertise assignments to LLMs effectively handle Degeneration-of-Thought, reducing reliance on a separate judge model, and preventing convergence to incorrect majority opinions. Experiments on CodeComplex show that MEC$^3$O outperforms the open-source baselines, achieving at least 10% higher accuracy and macro-F1 scores. It also surpasses GPT-4o-mini in macro-F1 scores on average and demonstrates competitive on-par F1 scores to GPT-4o and GPT-o4-mini on average. This demonstrates the effectiveness of multi-expert debates and weight consensus strategy to generate the final predictions. Our code and data is available at https://github.com/suhanmen/MECO.

MEC$^3$O: Multi-Expert Consensus for Code Time Complexity Prediction

TL;DR

This paper proposes MECO, a multi-expert consensus system, which extends the multi-agent debate frameworks and assigns LLMs to complexity classes based on their performance and provides them with class-specialized instructions, turning them into experts.

Abstract

Predicting the complexity of source code is essential for software development and algorithm analysis. Recently, Baik et al. (2025) introduced CodeComplex for code time complexity prediction. The paper shows that LLMs without fine-tuning struggle with certain complexity classes. This suggests that no single LLM excels at every class, but rather each model shows advantages in certain classes. We propose MECO, a multi-expert consensus system, which extends the multi-agent debate frameworks. MECO assigns LLMs to complexity classes based on their performance and provides them with class-specialized instructions, turning them into experts. These experts engage in structured debates, and their predictions are integrated through a weighted consensus mechanism. Our expertise assignments to LLMs effectively handle Degeneration-of-Thought, reducing reliance on a separate judge model, and preventing convergence to incorrect majority opinions. Experiments on CodeComplex show that MECO outperforms the open-source baselines, achieving at least 10% higher accuracy and macro-F1 scores. It also surpasses GPT-4o-mini in macro-F1 scores on average and demonstrates competitive on-par F1 scores to GPT-4o and GPT-o4-mini on average. This demonstrates the effectiveness of multi-expert debates and weight consensus strategy to generate the final predictions. Our code and data is available at https://github.com/suhanmen/MECO.

Paper Structure

This paper contains 50 sections, 13 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Procedural comparison of Single LLM, multi-agent debate, and MEC3O approaches.
  • Figure 2: An Overview of MEC3O. Step 1: Expertise assignments via model selection and expertise assignments by class-specific instructions. Step 2: Multi-expert debates. Step 3: Weighted consensus strategy for the final prediction. Appendix \ref{['app:debate_procedure']} provides a workflow of the debate process.
  • Figure 3: Confusion matrices of Java performance.
  • Figure 4: CoT confusion matrices for Java and Python.
  • Figure 5: Self-Consistency confusion matrices for Java and Python.
  • ...and 6 more figures