Basic and Parallel Composition
Introduction
Composition is a useful property of differential privacy. It allows us to reason about the overall privacy cost of releasing multiple outputs on the same database. The property is especially applicable for real-world deployments of differentially private mechanisms. Different mechanisms can query the same database (or subsets of it). Each of them can also ask multiple queries.
Note that a mechanism can also take a previous mechanism’s output as its input; the only caveat is that it should take the database as an input too. Otherwise, we would apply the post-processing property of differential privacy and not composition for this mechanism. Composition provides an upper bound on the privacy cost of releasing the outputs to all such queries. In this post, we will explore two kinds of composition: basic and parallel.
Basic Composition
Basic (or sequential) composition applies when the entire database is queried multiple times. Suppose our database \(D\) is simply a list of real numbers. Assume we want to compute some statistics on this database using two differentially private mechanisms \(M_{1}\) and \(M_{2}\). For example, \(M_{1}\) computes the mean of all the numbers in the database, and \(M_{2}\) computes the mode. If \(M_{1}\) is \(\epsilon_{1}\)-differentially private and \(M_{2}\) is \(\epsilon_{2}\)-differentially private, then basic composition tells us releasing both outputs \((M_{1}(D), M_{2}(D))\) will satisfy at most \((\epsilon_{1} + \epsilon_{2})\)-differential privacy. This property can be generalized to \(k\) mechanisms and approximate differential privacy; the resulting \(\epsilon, \delta\) parameters will just be the sums of the \(\epsilon_{i}\)s and \(\delta_{i}\)s of the individual mechanisms.
Parallel Composition
Parallel composition applies when mechanisms query disjoint partitions of the database. Take the same example of a database \(D\) and mechanisms \(M_{1}\) and \(M_{2}\) as before. Suppose \(D\) contains 1000 numbers. Instead of computing statistics on the whole database, we want to release the mean of the first 500 numbers using \(M_{1}\) and the mode of the next 500 using \(M_{2}\). Let us denote these two disjoint partitions of the database using \(X_{1}\) and \(X_{2}\). Parallel composition tells us releasing both outputs \((M_{1}(D \cap X_{1}), M_{2}(D \cap X_{2}))\) will satisfy at most \(\max(\epsilon_{1}, \epsilon_{2})\)-differential privacy. This property can be generalized to \(k\) mechanisms (each operating on a disjoint partition of the database); the resulting \(\epsilon\) parameter will just be the maximum of the \(\epsilon_{i}\)s of the individual mechanisms.
Choosing Between Basic and Parallel Composition
A neat way to understand when to apply basic or parallel composition is to check how the mechanism outputs will be affected when an individual record is changed in the database. In the example above, changing one number and releasing the mean and mode of all 1000 numbers in the database \(D\) would have changed the outputs of both \(M_{1}\) and \(M_{2}\). This means an individual record “influences” the outputs of two mechanisms, and the privacy cost will add up. Hence, we need to apply basic composition.
However, if we wanted to compute the mean over the first 500 numbers and the mode on the next 500, we can apply parallel composition. To see why, consider, without loss of generality, that a record in the \(X_{1}\) partition of \(D\) was changed. Since we know the two partitions are disjoint, changing one record in \(X_{1}\) will not impact \(X_{2}\). This means only the output of \(M_{1}\) would have been affected and \(M_{2}\) would operate on the same partition as before. An individual record only “influences” the output of the mechanism that accesses the partition it was in. We can safely apply parallel composition in this case.
References
Dwork, C., & Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4), 211-407.
McSherry, F. D. (2009, June). Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (pp. 19-30).