Towards ethical multimodal systems
Alexis Roger, Esma Aïmeur, Irina Rish
TL;DR
The paper tackles ethics in multimodal AI systems by building a scalable, crowdsourced dataset of image–question prompts with ethical judgments and by evaluating automatic ethics classifiers. It introduces a MAGMA-based data generation flow and a Discord-driven evaluation pipeline with safeguards to prevent bias and manipulation. It evaluates two classifiers: a RoBERTa-large text-only model and a multimodal MLP using CLIP-GPT-J embeddings, reporting around 52% and 55% accuracy respectively on majority-labeled data, underscoring the importance of visual context. The work provides the first open multimodal ethical dataset and baseline methods to advance AI alignment research for image+text systems, with plans for ongoing updates and expansion.
Abstract
Generative AI systems (ChatGPT, DALL-E, etc) are expanding into multiple areas of our lives, from art Rombach et al. [2021] to mental health Rob Morris and Kareem Kouddous [2022]; their rapidly growing societal impact opens new opportunities, but also raises ethical concerns. The emerging field of AI alignment aims to make AI systems reflect human values. This paper focuses on evaluating the ethics of multimodal AI systems involving both text and images - a relatively under-explored area, as most alignment work is currently focused on language models. We first create a multimodal ethical database from human feedback on ethicality. Then, using this database, we develop algorithms, including a RoBERTa-large classifier and a multilayer perceptron, to automatically assess the ethicality of system responses.
