Algorithms for Provisioning Queries and Analytics

Authors: Sepehr Assadi, Sanjeev Khanna, Yang Li, Val Tannen.
Conference: The 19th International Conference on Database Theory (ICDT'16).
Abstract: Provisioning is a technique for avoiding repeated expensive computations in what-if analysis. Given a query, an analyst formulates k hypotheticals, each retaining some of the tuples of a database instance (possibly overlapping), and she wishes to answer the query under scenarios, where a scenario is defined by a subset of the hypotheticals that are “turned on”. We say that a query admits compact provisioning if given any instance and k hypotheticals, one can create a poly-size (in k) sketch that can then be used to answer the query under any of the 2^k possible scenarios without accessing the original instance.
In this paper, we focus on provisioning complex queries that combine relational algebra (the logical component), grouping, and aggregates (the numerical component). We first show that for aggregates alone, quantiles and database-supported linear regression, along with simpler queries in-cluding count and sum/avg (of positive values), can be compactly provisioned to provide approximate answers to an arbitrary precision. In contrast, exact provisioning for any of those aggregates requires the sketch size to be exponential in the number of hypotheticals. We then establish that for any complex query whose logical component is a positive relational algebra, as long as the numerical component can be compactly provisioned to provide approximate answers to an arbitrary precision, the complex query itself can also be compactly provisioned to provide approximate answers to an arbitrary precision. On the other hand, introducing negation or recursion in the logical component requires the sketch size to be exponential in the number of hypotheticals.
Conference version: [PDF]
Full version: [arXiv]
BibTex: [DBLP]