PRACTICAL CONFIDENTIALITY-PRESERVING DATA ANALYTICS IN UNTRUSTED CLOUDS
Cloud computing offers a cost-efficient data analytics platform. This is enabled by constant innovations in tools and technologies for analyzing large volumes of data through distributed batch processing systems and real-time data through distributed stream processing systems. However, due to the sensitive nature of data, many organizations are reluctant to analyze their data in public clouds. To address this stalemate, both software-based and hardware-based solutions have been proposed yet all have substantial limitations in terms of efficiency, expressiveness, and security. In this thesis, we present solutions that enable practical and expressive confidentiality- preserving batch and stream-based analytics. We achieve this by performing computations over encrypted data using Partially Homomorphic Encryption (PHE) and Property-Preserving Encryption (PPE) in novel ways, and by utilizing remote or Trusted Execution Environment (TEE) based trusted services where needed.
We introduce a set of extensions and optimizations to PHE and PPE schemes and propose the novel abstraction of Secure Data Types (SDTs) which enables the application of PHE and PPE schemes in ways that improve performance and security. These abstractions are leveraged to enable a set of compilation techniques making data analytics over encrypted data more practical. When PHE alone is not expressive enough to perform analytics over encrypted data, we use a novel planner engine to decide the most efficient way of utilizing client-side completion, remote re-encryption, or trusted hardware re-encryption based on Intel Software Guard eXtensions (SGX) to overcome the limitations of PHE. We also introduce two novel symmetric PHE schemes that allow arithmetic operations over encrypted data. Being symmetric, our schemes are more efficient than the state-of-the-art asymmetric PHE schemes without compromising the level of security or the range of homomorphic operations they support. We apply the aforementioned techniques in the context of batch data analytics and demonstrate the improvements over previous systems. Finally, we present techniques designed to enable the use of PHE and PPE in resource-constrained Internet of Things (IoT) devices and demonstrate the practicality of stream processing over encrypted data.