Multimodal Approaches to Fair Image Classification: An Ethical Perspective
Javon Hickmon
TL;DR
This work addresses fairness in image classification by leveraging multimodal information at inference time. It introduces MuSE, a zero-shot framework that uses descriptive prompts and synthetic images to enrich embeddings, and D3G, a training-free method that generates diverse demographic data to offset bias in zero-shot classifiers. Across multiple datasets, MuSE yields consistent accuracy gains over CLIP baselines, while D3G demonstrates improvements in handling underrepresented demographics and provides insights into weighting text versus image contributions. The study highlights practical potential for responsible, inference-time bias mitigation in multimodal AI, while acknowledging limitations related to generative-model biases and demographic intersectionality that warrant careful deployment and further research.
Abstract
In the rapidly advancing field of artificial intelligence, machine perception is becoming paramount to achieving increased performance. Image classification systems are becoming increasingly integral to various applications, ranging from medical diagnostics to image generation; however, these systems often exhibit harmful biases that can lead to unfair and discriminatory outcomes. Machine Learning systems that depend on a single data modality, i.e. only images or only text, can exaggerate hidden biases present in the training data, if the data is not carefully balanced and filtered. Even so, these models can still harm underrepresented populations when used in improper contexts, such as when government agencies reinforce racial bias using predictive policing. This thesis explores the intersection of technology and ethics in the development of fair image classification models. Specifically, I focus on improving fairness and methods of using multiple modalities to combat harmful demographic bias. Integrating multimodal approaches, which combine visual data with additional modalities such as text and metadata, allows this work to enhance the fairness and accuracy of image classification systems. The study critically examines existing biases in image datasets and classification algorithms, proposes innovative methods for mitigating these biases, and evaluates the ethical implications of deploying such systems in real-world scenarios. Through comprehensive experimentation and analysis, the thesis demonstrates how multimodal techniques can contribute to more equitable and ethical AI solutions, ultimately advocating for responsible AI practices that prioritize fairness.
