Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution

Jorge Martinez-Gil

Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution

Jorge Martinez-Gil

TL;DR

This paper addresses the challenge of reliably measuring semantic similarity by automatically designing ensembles of similarity measures using grammatical evolution (GE). By encoding aggregation strategies as syntactically valid programs through a domain-specific Backus–Naur Form grammar, GE evolves ensemble formulas that maximize correlation with human judgments across multiple benchmarks, including MC30, GeReSiD50, and WS353. Empirical results show GE often outperforming traditional GP-based ensembles and simple baselines in SRCC, while offering interpretability and transferability through a grammar-guided, code-ready representation. The work provides a practical, scalable approach to adaptive similarity modeling and suggests pathways for future comparisons with deep learning models and extensions to other NLP tasks.

Abstract

Semantic similarity measures are a key component in natural language processing tasks such as document analysis, requirement matching, and user input interpretation. However, the performance of individual measures varies considerably across datasets. To address this, ensemble approaches that combine multiple measures are often employed. This paper presents an automated strategy based on grammatical evolution for constructing semantic similarity ensembles. The method evolves aggregation functions that maximize correlation with human-labeled similarity scores. Experiments on standard benchmark datasets demonstrate that the proposed approach outperforms existing ensemble techniques in terms of accuracy. The results confirm the effectiveness of grammatical evolution in designing adaptive and accurate similarity models. The source code that illustrates our approach can be downloaded from https://github.com/jorge-martinez-gil/sesige.

Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution

TL;DR

Abstract

Paper Structure (34 sections, 12 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 34 sections, 12 equations, 9 figures, 6 tables, 1 algorithm.

Introduction
State-of-the-art
Semantic Similarity
Grammatical evolution
Differences between Genetic Programming and Genetic Algorithm
Contribution over the state-of-the-art
Problem Statement
Methods
Mathematical Foundation
Genotype
Grammar
Mapping Function
Output
Summary
Fitness Function
...and 19 more sections

Figures (9)

Figure 1: Results for the a) PCC and b) SRCC over the MC30 benchmark dataset
Figure 2: Evolution of different variables during the ensemble learning process for PCC over the MC30 benchmark dataset
Figure 3: Evolution of different variables during the ensemble learning process for SRCC over the MC30 benchmark dataset
Figure 4: Results for the a) PCC and b) SRCC over the GeReSiD50 benchmark dataset
Figure 5: Evolution of key variables during the ensemble learning process for PCC over the GeReSiD50 dataset
...and 4 more figures

Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution

TL;DR

Abstract

Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution

Authors

TL;DR

Abstract

Table of Contents

Figures (9)