Table of Contents
Fetching ...

UniMat: Unifying Materials Embeddings through Multi-modal Learning

Janghoon Ock, Joseph Montoya, Daniel Schweigert, Linda Hung, Santosh K. Suram, Weike Ye

TL;DR

This work evaluates common techniques in multi-modal learning in unifying some of the most important modalities in materials science: atomic structure, X-ray diffraction patterns (XRD), and composition and shows that structure graph modality can be enhanced by aligning with XRD patterns.

Abstract

Materials science datasets are inherently heterogeneous and are available in different modalities such as characterization spectra, atomic structures, microscopic images, and text-based synthesis conditions. The advancements in multi-modal learning, particularly in vision and language models, have opened new avenues for integrating data in different forms. In this work, we evaluate common techniques in multi-modal learning (alignment and fusion) in unifying some of the most important modalities in materials science: atomic structure, X-ray diffraction patterns (XRD), and composition. We show that structure graph modality can be enhanced by aligning with XRD patterns. Additionally, we show that aligning and fusing more experimentally accessible data formats, such as XRD patterns and compositions, can create more robust joint embeddings than individual modalities across various tasks. This lays the groundwork for future studies aiming to exploit the full potential of multi-modal data in materials science, facilitating more informed decision-making in materials design and discovery.

UniMat: Unifying Materials Embeddings through Multi-modal Learning

TL;DR

This work evaluates common techniques in multi-modal learning in unifying some of the most important modalities in materials science: atomic structure, X-ray diffraction patterns (XRD), and composition and shows that structure graph modality can be enhanced by aligning with XRD patterns.

Abstract

Materials science datasets are inherently heterogeneous and are available in different modalities such as characterization spectra, atomic structures, microscopic images, and text-based synthesis conditions. The advancements in multi-modal learning, particularly in vision and language models, have opened new avenues for integrating data in different forms. In this work, we evaluate common techniques in multi-modal learning (alignment and fusion) in unifying some of the most important modalities in materials science: atomic structure, X-ray diffraction patterns (XRD), and composition. We show that structure graph modality can be enhanced by aligning with XRD patterns. Additionally, we show that aligning and fusing more experimentally accessible data formats, such as XRD patterns and compositions, can create more robust joint embeddings than individual modalities across various tasks. This lays the groundwork for future studies aiming to exploit the full potential of multi-modal data in materials science, facilitating more informed decision-making in materials design and discovery.

Paper Structure

This paper contains 16 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The best-performing model setup of the alignment experiment is shown. The structure graph encoded by GNN, and the XRD pattern encoded by CNN are trained with CLIP loss. The structure embedding was then utilized to perform downstream tasks.
  • Figure 2: The best-performing model setup from the fusion experiment is shown. The XRD pattern, encoded by a CNN, and the Magpie featurized composition, encoded by an MLP, are aligned by training with the CLIP loss. The fused embedding between the two is then utilized to perform downstream tasks.
  • Figure 3: The clustering of latent space by crystal systems.