Optimizing Big Active Data Management Systems
Shahrzad Haji Amin Shirazi, Xikui Wang, Michael J. Carey, Vassilis J. Tsotras
TL;DR
The paper tackles the scalability of Big Active Data (BAD) platforms by introducing three optimizations: subscription aggregation, query-plan augmentation, and a BAD index for early result filtering. These techniques collectively reduce redundant computation, align processing with user interests earlier in the pipeline, and filter data before heavy processing, enabling BAD to support more subscribers and higher data rates without additional resources. Through extensive experiments on synthetic and real Twitter data across up to eight nodes, the work shows significant performance gains, improved broker efficiency, and stronger scale-up behavior. The findings highlight practical impact for proactive, data-driven notification services in large-scale, semi-structured data ecosystems like BAD built atop AsterixDB. Future work points to coordinating optimizations across multiple channels and further refining cross-channel indexing strategies to sustain gains at larger deployments.
Abstract
Within the dynamic world of Big Data, traditional systems typically operate in a passive mode, processing and responding to user queries by returning the requested data. However, this methodology falls short of meeting the evolving demands of users who not only wish to analyze data but also to receive proactive updates on topics of interest. To bridge this gap, Big Active Data (BAD) frameworks have been proposed to support extensive data subscriptions and analytics for millions of subscribers. As data volumes and the number of interested users continue to increase, the imperative to optimize BAD systems for enhanced scalability, performance, and efficiency becomes paramount. To this end, this paper introduces three main optimizations, namely: strategic aggregation, intelligent modifications to the query plan, and early result filtering, all aimed at reinforcing a BAD platform's capability to actively manage and efficiently process soaring rates of incoming data and distribute notifications to larger numbers of subscribers.
