Privacy-Enhancing Techniques for Data Analytics

Rao, Fang-Yu

doi:10.25394/PGS.7967294.v1

thesis_Fang-Yu_Rao.pdf (1.25 MB)

Privacy-Enhancing Techniques for Data Analytics

thesis

posted on 2019-06-10, 17:34 authored by Fang-Yu RaoFang-Yu Rao

Organizations today collect and aggregate huge amounts of data from individuals under various scenarios and for different purposes. Such aggregation of individuals’ data when combined with techniques of data analytics allows organizations to make informed decisions and predictions. But in many situations, different portions of the data associated with individuals are collected and curated by different organizations. To derive more accurate conclusions and predictions, those organization may want to conduct the analysis based on their joint data, which cannot be simply accomplished by each organization exchanging its own data with other organizations due to the sensitive nature of data. Developing approaches for collaborative privacy-preserving data analytics, however, is a nontrivial task. At least two major challenges have to be addressed. The first challenge is that the security of the data possessed by each organization should always be properly protected during and after the collaborative analysis process, whereas the second challenge is the high computational complexity usually accompanied by cryptographic primitives used to build such privacy-preserving protocols.

In this dissertation, based on widely adopted primitives in cryptography, we address the aforementioned challenges by developing techniques for data analytics that not only allow multiple mutually distrustful parties to perform data analysis on their joint data in a privacy-preserving manner, but also reduce the time required to complete the analysis. More specifically, using three common data analytics tasks as concrete examples, we show how to construct the respective privacy-preserving protocols under two different scenarios: (1) the protocols are executed by a collaborative process only involving the participating parties; (2) the protocols are outsourced to some service providers in the cloud. Two types of optimization for improving the efficiency of those protocols are also investigated. The first type allows each participating party access to a statistically controlled leakage so as to reduce the amount of required computation, while the second type utilizes the parallelism that could be incorporated into the task and pushes some computation to the offline phase to reduce the time needed for each participating party without any additional leakage. Extensive experiments are also conducted on real-world datasets to demonstrate the effectiveness of our proposed techniques.

Funding

CNS-1111512

History

Degree Type

Doctor of Philosophy

Department

Computer Science

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Elisa Bertino

Additional Committee Member 2

Ninghui Li

Additional Committee Member 3

Sonia Fahmy

Additional Committee Member 4

Dan Goldwasser

Usage metrics

Keywords

Differential Privacy Secure Multiparty Computation Record Linkage K-Means Clustering Computer System Security Data Encryption

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Privacy-Enhancing Techniques for Data Analytics

Funding

CNS-1111512

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Additional Committee Member 4

Usage metrics

Categories

Keywords

Licence

Exports