Abstract:
Provisioning is a technique for avoiding repeated expensive computations in what-if analysis. Given a query, an analyst formulates k hypotheticals, each retaining
some of the tuples of a database instance (possibly overlapping), and she wishes to answer the query under scenarios, where a
scenario is defined by a subset of the hypotheticals that are “turned on”. We say that a query
admits compact provisioning if given any instance and k hypotheticals, one can create a poly-size (in k) sketch that can then be
used to answer the query under any of the 2^k possible scenarios without accessing the original instance.
In this paper, we focus on provisioning complex queries that combine relational algebra (the logical component), grouping, and aggregates (the numerical component). We first show
that for aggregates alone, quantiles and database-supported linear regression, along with simpler queries in-cluding count and sum/avg (of positive values), can be
compactly provisioned to provide approximate answers to an arbitrary precision. In contrast, exact provisioning for any of those
aggregates requires the sketch size to be exponential in the number of hypotheticals. We then establish that
for any complex query whose logical component is a positive relational algebra, as long as the numerical component
can be compactly provisioned to provide approximate answers to an arbitrary precision, the complex query itself can also be compactly provisioned
to provide approximate answers to an arbitrary precision.
On the other hand, introducing negation or recursion in the logical component requires the sketch size to be exponential in the number of hypotheticals.
Conference version:
[PDF]