File(s) under embargo
until file(s) become available
Resilient Wide Area Network Routing Algorithms For Meeting Service Level Objectives
It is challenging to meet stringent network performance requirements with growing traffic and increasing expectations on higher performance. Downtime caused by failures can cost billions of dollars and cause severe problems. In this thesis, I have explored the problem of how to design network for Service Level Objectives (SLOs) with an emphasis on provable performance guarantees under failures. This thesis not only considers worst-case guarantees, but also considers requirements that must be met a percentage of time given SLOs are typically expressed in this fashion. To tackle the problem, this thesis makes the following contributions: (i) PCF, a novel set of mechanisms which ensure the network is provably congestion-free under failures. PCF outperforms FFC, the state-of-the-art mechanism, by a factor of 1.5X on average across 21 topologies; (ii) key components of Lancet, the first system for designing protection routing schemes that can meet a performance target a desired percentage of time; (iii) Flexile, a system for designing routing that meets the bandwidth requirements of flows for a desired percentile of time. Flexile exploits a key unexplored opportunity that each flow's requirement could be met using a different set of failure states. Our experiments show that Flexile outperforms state-of-the-art schemes including SMORE and Teavar in reducing loss at desired percentiles by 46% or more in the median case.