PRACTICAL CLOUD COMPUTING INFRASTRUCTURE
thesisposted on 2021-03-12, 16:26 authored by James A LembkeJames A Lembke
Cloud and parallel computing are fundamental components in the processing of large data sets. Deployments of distributed computers require network infrastructure that is fast, efficient, and secure. Software Defined Networking (SDN) separates the forwarding of network data by switches (data plane) from the setting and managing of network policies (control plane). While this separation provides flexibility for setting network policies affecting the establishment of network flows in the data plane, it provides little to no fault tolerance for failures, either benign or caused by corrupted/malicious applications. Such failures can cause network flows to be incorrectly routed through the network or stop such flows altogether. Without protection against faults, cloud network providers using SDN run the risk of inefficient allocation of network resources or even data loss. Furthermore, the asynchronous nature existing protocols for SDN does not provide a mechanism for consistency in network policy updates across multiple switches.
In addition, cloud and parallel applications require an efficient means for accessing local system data (input data sets, temporary storage locations, etc.). While in many cases it may be possible for a process to access this data by making calls directly to a file system (FS) kernel driver, this is not always possible (e.g. when using experimental distributed FSs where the needed libraries for accessing the FS only exist in user space).
This dissertation provides a design for fault tolerance of SDN and infrastructure for advancing the performance of user space FSs. It is divided into three main parts. The first part describes a fault tolerant, distributed SDN control plane framework. The second part expands upon the fault tolerant approach to SDN control plane by providing a practical means for dynamic control plane membership as well as providing a simple mechanism for controller authentication through threshold signatures. The third part describes an efficient framework for user space FS access.
This research makes three contributions. First, the design, specification, implementation, and evaluation of a method for fault tolerant SDN control plane that is inter-operable with existing control plane applications involving minimal instrumentation of the data plane runtime. Second, the design, specification, implementation and evaluation of a mechanism for dynamic SDN control plane membership that all ensure consistency of network policy updates and minimizes switch overhead through the use of distributed key generation and threshold signatures. Third, the design, specification, implementation, and evaluation of a user space FS access framework that is correct to the Portable Operating System Interface (POSIX) specification with significantly better performance over existing user space access methods, while requiring no implementation changes for application programmers.