Table of Contents
Fetching ...

Competing DEA procedures: analysis, testing, and comparisons

Gregory Koronakos, Jose H Dula, Dimitris K Despotis

TL;DR

DEA classification requires solving $n$ LPs on $n$ DMUs with $m$ inputs+outputs; this work compares two sequential data-processing procedures, BuildHull and Enhanced Hierarchical Decomposition (EHD), under shared preprocessors on 48 MassiveScaleDEAdata data sets. BuildHull consistently achieves faster execution than EHD, with speedups increasing at higher density $d$ and larger cardinalities $n$, primarily due to smaller and more stable LP sizes in the second phase. The study analyzes the number and size of LPs to explain performance differences and establishes a common ground for comparing parallel implementations and future hybrids. The findings support using BuildHull when parallelization is limited and motivate further work on parallelization, complexity analysis, and hybrid methods in computational DEA.

Abstract

Reducing the computational time to process large data sets in Data Envelopment Analysis (DEA) is the objective of many studies. Contributions include fundamentally innovative procedures, new or improved preprocessors, and hybridization between - and among - all these. Ultimately, new contributions are made when the number and size of the LPs solved is somehow reduced. This paper provides a comprehensive analysis and comparison of two competing procedures to process DEA data sets: BuildHull and Enhanced Hierarchical Decomposition (EHD). A common ground for comparison is made by examining their sequential implementations, applying to both the same preprocessors - when permitted - on a suite of data sets widely employed in the computational DEA literature. In addition to reporting on execution time, we discuss how the data characteristics affect performance and we introduce using the number and size of the LPs solved to better understand performances and explain differences. Our experiments show that the dominance of BuildHull can be substantial in large-scale and high-density datasets. Comparing and explaining performance based on the number and size of LPS lays the groundwork for a comparison of the parallel implementations of procedures BuildHull and EHD.

Competing DEA procedures: analysis, testing, and comparisons

TL;DR

DEA classification requires solving LPs on DMUs with inputs+outputs; this work compares two sequential data-processing procedures, BuildHull and Enhanced Hierarchical Decomposition (EHD), under shared preprocessors on 48 MassiveScaleDEAdata data sets. BuildHull consistently achieves faster execution than EHD, with speedups increasing at higher density and larger cardinalities , primarily due to smaller and more stable LP sizes in the second phase. The study analyzes the number and size of LPs to explain performance differences and establishes a common ground for comparing parallel implementations and future hybrids. The findings support using BuildHull when parallelization is limited and motivate further work on parallelization, complexity analysis, and hybrid methods in computational DEA.

Abstract

Reducing the computational time to process large data sets in Data Envelopment Analysis (DEA) is the objective of many studies. Contributions include fundamentally innovative procedures, new or improved preprocessors, and hybridization between - and among - all these. Ultimately, new contributions are made when the number and size of the LPs solved is somehow reduced. This paper provides a comprehensive analysis and comparison of two competing procedures to process DEA data sets: BuildHull and Enhanced Hierarchical Decomposition (EHD). A common ground for comparison is made by examining their sequential implementations, applying to both the same preprocessors - when permitted - on a suite of data sets widely employed in the computational DEA literature. In addition to reporting on execution time, we discuss how the data characteristics affect performance and we introduce using the number and size of the LPs solved to better understand performances and explain differences. Our experiments show that the dominance of BuildHull can be substantial in large-scale and high-density datasets. Comparing and explaining performance based on the number and size of LPS lays the groundwork for a comparison of the parallel implementations of procedures BuildHull and EHD.
Paper Structure (13 sections, 1 equation, 4 figures, 13 tables, 1 algorithm)

This paper contains 13 sections, 1 equation, 4 figures, 13 tables, 1 algorithm.

Figures (4)

  • Figure 1: Execution times for all data sets in the MassiveScaleDEAdata sorted by Cardinality$\rightarrow$Dimension$\rightarrow$Density.
  • Figure 2: Impact of density on running time for the data sets in the MassiveScaleDEAdata suite when the cardinality is $n=$100K.
  • Figure 3: Impact of cardinality on running time for the data sets in the MassiveScaleDEAdata suite when the density is $d=10\%$.
  • Figure 4: Impact of dimension on running time for data sets in the MassiveScaleDEAdata suite.