Challenges and Future Directions of Data-Centric AI Alignment
Min-Hsuan Yeh, Jeffrey Wang, Xuefeng Du, Seongheon Park, Leitian Tao, Shawn Im, Yixuan Li
TL;DR
The paper argues that aligning powerful AI systems requires more than algorithmic advances; data quality and representativeness in alignment feedback are central. Through qualitative analysis of human feedback reliability and AI-based feedback limitations, it identifies six unreliability sources and substantial issues from temporal drift to context-dependence. It then proposes concrete future directions, including holistic and dynamic data collection, rigorous protocol validation, collaborative data cleaning, and standardized verification to improve data-centric alignment. Collectively, these data-focused strategies aim to produce AI systems that better reflect diverse human values and adapt to changing contexts, enhancing safety and trust in real-world deployment.
Abstract
As AI systems become increasingly capable and influential, ensuring their alignment with human values, preferences, and goals has become a critical research focus. Current alignment methods primarily focus on designing algorithms and loss functions but often underestimate the crucial role of data. This paper advocates for a shift towards data-centric AI alignment, emphasizing the need to enhance the quality and representativeness of data used in aligning AI systems. In this position paper, we highlight key challenges associated with both human-based and AI-based feedback within the data-centric alignment framework. Through qualitative analysis, we identify multiple sources of unreliability in human feedback, as well as problems related to temporal drift, context dependence, and AI-based feedback failing to capture human values due to inherent model limitations. We propose future research directions, including improved feedback collection practices, robust data-cleaning methodologies, and rigorous feedback verification processes. We call for future research into these critical directions to ensure, addressing gaps that persist in understanding and improving data-centric alignment practices.
