Table of Contents
Fetching ...

Extract-Transform-Load for Video Streams

Ferdinand Kossmann, Ziniu Wu, Eugenie Lai, Nesime Tatbul, Lei Cao, Tim Kraska, Samuel Madden

TL;DR

This work defines Video Extract-Transform-Load (V-ETL) and introduces Skyscraper, a system that delivers throughput guarantees while reducing the cost of large-scale video ingestion. By offline profiling to identify content categories and knob-frontier configurations, plus a predictive planner and a reactive switcher, Skyscraper allocates on-premise, buffer, and cloud resources for cost-efficient V-ETL. It formalizes a linear-programming framework for knob planning and uses lightweight category inference to guide online decisions with minimal overhead. Experiments across COVID, MOT, and MOSEI workloads show substantial cost savings (up to 8.7×) while maintaining robust throughput and quality, demonstrating practical viability for scalable video warehouses. Overall, Skyscraper provides a practical, generalizable approach to deploying V-ETL pipelines with controllable quality-cost trade-offs in constrained environments.

Abstract

Social media, self-driving cars, and traffic cameras produce video streams at large scales and cheap cost. However, storing and querying video at such scales is prohibitively expensive. We propose to treat large-scale video analytics as a data warehousing problem: Video is a format that is easy to produce but needs to be transformed into an application-specific format that is easy to query. Analogously, we define the problem of Video Extract-Transform-Load (V-ETL). V-ETL systems need to reduce the cost of running a user-defined V-ETL job while also giving throughput guarantees to keep up with the rate at which data is produced. We find that no current system sufficiently fulfills both needs and therefore propose Skyscraper, a system tailored to V-ETL. Skyscraper can execute arbitrary video ingestion pipelines and adaptively tunes them to reduce cost at minimal or no quality degradation, e.g., by adjusting sampling rates and resolutions to the ingested content. Skyscraper can hereby be provisioned with cheap on-premises compute and uses a combination of buffering and cloud bursting to deal with peaks in workload caused by expensive processing configurations. In our experiments, we find that Skyscraper significantly reduces the cost of V-ETL ingestion compared to adaptions of current SOTA systems, while at the same time giving robustness guarantees that these systems are lacking.

Extract-Transform-Load for Video Streams

TL;DR

This work defines Video Extract-Transform-Load (V-ETL) and introduces Skyscraper, a system that delivers throughput guarantees while reducing the cost of large-scale video ingestion. By offline profiling to identify content categories and knob-frontier configurations, plus a predictive planner and a reactive switcher, Skyscraper allocates on-premise, buffer, and cloud resources for cost-efficient V-ETL. It formalizes a linear-programming framework for knob planning and uses lightweight category inference to guide online decisions with minimal overhead. Experiments across COVID, MOT, and MOSEI workloads show substantial cost savings (up to 8.7×) while maintaining robust throughput and quality, demonstrating practical viability for scalable video warehouses. Overall, Skyscraper provides a practical, generalizable approach to deploying V-ETL pipelines with controllable quality-cost trade-offs in constrained environments.

Abstract

Social media, self-driving cars, and traffic cameras produce video streams at large scales and cheap cost. However, storing and querying video at such scales is prohibitively expensive. We propose to treat large-scale video analytics as a data warehousing problem: Video is a format that is easy to produce but needs to be transformed into an application-specific format that is easy to query. Analogously, we define the problem of Video Extract-Transform-Load (V-ETL). V-ETL systems need to reduce the cost of running a user-defined V-ETL job while also giving throughput guarantees to keep up with the rate at which data is produced. We find that no current system sufficiently fulfills both needs and therefore propose Skyscraper, a system tailored to V-ETL. Skyscraper can execute arbitrary video ingestion pipelines and adaptively tunes them to reduce cost at minimal or no quality degradation, e.g., by adjusting sampling rates and resolutions to the ingested content. Skyscraper can hereby be provisioned with cheap on-premises compute and uses a combination of buffering and cloud bursting to deal with peaks in workload caused by expensive processing configurations. In our experiments, we find that Skyscraper significantly reduces the cost of V-ETL ingestion compared to adaptions of current SOTA systems, while at the same time giving robustness guarantees that these systems are lacking.
Paper Structure (61 sections, 5 equations, 17 figures, 6 tables)

This paper contains 61 sections, 5 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Skyscraper optimizing the expensive V-ETL Transform step of the EV counting example job. The blue components are provided by Skyscraper, the red ones are provided by the user.
  • Figure 2: Overview over all processing steps of Skyscraper.
  • Figure 3: Running the EV workload over a traffic camera.
  • Figure 4: Cost-quality trade-off of Skyscraper, Chameleon$^{*}$ and statically using the same knob throughout ingestion.
  • Figure 13: Overheads: knob switcher ($<$1ms) and planner ($<$1s)
  • ...and 12 more figures