Table of Contents
Fetching ...

Multimodal Quantum Natural Language Processing: A Novel Framework for using Quantum Methods to Analyse Real Data

Hala Hawashin

TL;DR

Results indicate that syntax-based models, particularly DisCoCat and TreeReader, excel in effectively capturing grammatical structures, while bag-of-words and sequential models struggle due to limited syntactic awareness.

Abstract

Despite significant advances in quantum computing across various domains, research on applying quantum approaches to language compositionality - such as modeling linguistic structures and interactions - remains limited. This gap extends to the integration of quantum language data with real-world data from sources like images, video, and audio. This thesis explores how quantum computational methods can enhance the compositional modeling of language through multimodal data integration. Specifically, it advances Multimodal Quantum Natural Language Processing (MQNLP) by applying the Lambeq toolkit to conduct a comparative analysis of four compositional models and evaluate their influence on image-text classification tasks. Results indicate that syntax-based models, particularly DisCoCat and TreeReader, excel in effectively capturing grammatical structures, while bag-of-words and sequential models struggle due to limited syntactic awareness. These findings underscore the potential of quantum methods to enhance language modeling and drive breakthroughs as quantum technology evolves.

Multimodal Quantum Natural Language Processing: A Novel Framework for using Quantum Methods to Analyse Real Data

TL;DR

Results indicate that syntax-based models, particularly DisCoCat and TreeReader, excel in effectively capturing grammatical structures, while bag-of-words and sequential models struggle due to limited syntactic awareness.

Abstract

Despite significant advances in quantum computing across various domains, research on applying quantum approaches to language compositionality - such as modeling linguistic structures and interactions - remains limited. This gap extends to the integration of quantum language data with real-world data from sources like images, video, and audio. This thesis explores how quantum computational methods can enhance the compositional modeling of language through multimodal data integration. Specifically, it advances Multimodal Quantum Natural Language Processing (MQNLP) by applying the Lambeq toolkit to conduct a comparative analysis of four compositional models and evaluate their influence on image-text classification tasks. Results indicate that syntax-based models, particularly DisCoCat and TreeReader, excel in effectively capturing grammatical structures, while bag-of-words and sequential models struggle due to limited syntactic awareness. These findings underscore the potential of quantum methods to enhance language modeling and drive breakthroughs as quantum technology evolves.

Paper Structure

This paper contains 38 sections, 7 equations, 33 figures, 5 tables.

Figures (33)

  • Figure 1: This diagram shows the architecture of the Transformer model as described in the paper "Attention is All You Need" attention2017.
  • Figure 2: This figure illustrates an example of the test set used in the paper "Handwritten Digit Recognition with a Back-Propagation Network" lecun1989handwritten.
  • Figure 3: This diagram shows the architecture of AlexNet, as introduced in the paper by krizhevsky2012imagenet for the ImageNet competition.
  • Figure 4: This diagrams represents the architecture of VGGNet as explained in the article by VGGNet.
  • Figure 5: This figure shows the ResNet-15 architecture, as presented in ResNet2016.
  • ...and 28 more figures