Enterprises and governments around the world have been attempting to leverage intelligence from the community by making formally in-house database available to the public for analyzing. The released data was often “anonymized”: sensitive attributes were removed from the dataset for privacy protection. However it is proved that masking sensitive attributes alone is not adequate for data protection. Differential privacy can be used to generate “synthetic dataset” that retain statistical properties of the original dataset and limit data-leaking risk at the same time, but there's always a trade-off between data privacy and utility. In this study we aggregate data counts across value with little counts to ease the problem of excessive error at the data value with small data count. Experiments show that K-aggregation has the potential to reduce error of count queries on value with smaller counts. Limitations of this approach are also discussed.
Relation:
2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA), p.772-779