EFFICIENT LSM SECONDARY INDEXING FOR UPDATE-INTENSIVE WORKLOADS
In recent years, massive amounts of data have been generated from various types of devices or services. For these data, update-intensive workloads where the data update their status periodically and continuously are common. The Log-Structured-Merge (LSM, for short) is a widely-used indexing technique in various systems, where index structures buffer insert operations into the memory layer and flush them into disk when the data size in memory exceeds a threshold. Despite its noble ability to handle write-intensive (i.e., insert-intensive) workloads, LSM suffers from degraded query performance due to its inefficiency on index maintenance of secondary keys to handle update-intensive workloads.
This dissertation focuses on the efficient support of update-intensive workloads for LSM-based indexes. First, the focus is on the optimization of LSM secondary-key indexes and their support for update-intensive workloads. A mechanism to enable the LSM R-tree to handle update-intensive workloads efficiently is introduced. The new LSM indexing structure is termed the LSM RUM-tree, an LSM R-tree with Update Memo. The key insights are to reduce the maintenance cost of the LSM R-tree by leveraging an additional in-memory memo structure to control the size of the memo to fit in memory. In the experiments, the LSM RUM-tree achieves up to 9.6x speedup on update operations and up to 2400x speedup on query operations.
Second, the focus is to offer several significant advancements in the context of the LSM RUM-tree. We provide an extended examination of LSM-aware Update Memo (UM) cleaning strategies, elucidating how effectively each strategy reduces UM size and contributes to performance enhancements. Moreover, in recognition of the imperative need to facilitate concurrent activities within the LSM RUM-Tree, particularly in multi-threaded/multi-core environments, we introduce a pivotal feature of concurrency control for the update memo. The novel atomic operation known as Compare and If Less than Swap (CILS) is introduced to enable seamless concurrent operations on the Update Memo. Experimental results attest to a notable 4.5x improvement in the speed of concurrent update operations when compared to existing and baseline implementations.
Finally, we present a novel technique designed to improve query processing performance and optimize storage management in any secondary LSM tree. Our proposed approach introduces a new framework and mechanisms aimed at addressing the specific challenges associated with secondary indexing in the structure of the LSM tree, especially in the context of secondary LSM B+-tree (LSM BUM-tree). Experimental results show that the LSM BUM-tree achieves up to 5.1x speedup on update-intensive workloads and 107x speedup on update and query mixed workloads over existing LSM B+-tree implementations.
- Doctor of Philosophy
- Computer Science
- West Lafayette