Evolutionary Algorithms Approach For Search Based On Semantic Document Similarity
Chandrashekar Muniyappa, Eujin Kim
TL;DR
The paper tackles improving Top-N document retrieval for a user query by optimizing over semantic embeddings rather than relying on a fixed ranking. It employs Universal Sentence Encoder to generate 512-dimensional sentence embeddings and applies Genetic Algorithm and Differential Evolution with transfer learning to search for the most relevant questions in the SQuAD dataset. Fitness is defined by the Manhattan distance between the query embedding and dataset embeddings, and experiments indicate that evolutionary search can produce high-quality Top-N results and be more robust than standard ranking in some cases. The work demonstrates the viability of combining semantic embeddings with evolutionary optimization for semantic search and discusses suboptimal solutions and future multi-objective enhancements to better capture diverse relevant results.
Abstract
Advancements in cloud computing and distributed computing have fostered research activities in Computer science. As a result, researchers have made significant progress in Neural Networks, Evolutionary Computing Algorithms like Genetic, and Differential evolution algorithms. These algorithms are used to develop clustering, recommendation, and question-and-answering systems using various text representation and similarity measurement techniques. In this research paper, Universal Sentence Encoder (USE) is used to capture the semantic similarity of text; And the transfer learning technique is used to apply Genetic Algorithm (GA) and Differential Evolution (DE) algorithms to search and retrieve relevant top N documents based on user query. The proposed approach is applied to the Stanford Question and Answer (SQuAD) Dataset to identify a user query. Finally, through experiments, we prove that text documents can be efficiently represented as sentence embedding vectors using USE to capture the semantic similarity, and by comparing the results of the Manhattan Distance, GA, and DE algorithms we prove that the evolutionary algorithms are good at finding the top N results than the traditional ranking approach.
