Efficient Query Processing Over Web-Scale RDF Data

Madkour, Amgad M.

doi:10.25394/PGS.7413308.v1

Efficient_Query_Processing_Over_Web_Scale_RDF_Data.pdf (1.98 MB)

Efficient Query Processing Over Web-Scale RDF Data

thesis

posted on 2019-01-17, 14:37 authored by Amgad M. MadkourAmgad M. Madkour

The Semantic Web, or the Web of Data, promotes common data formats for representing structured data and their links over the web. RDF is the defacto standard for semantic data where it provides a flexible semi-structured model for describing concepts and relationships. RDF datasets consist of entries (i.e, triples) that range from thousands to Billions. The astronomical growth of RDF data calls for scalable RDF management and query processing strategies. This dissertation addresses efficient query processing over web-scale RDF data. The first contribution is WORQ, an online, workload-driven, RDF query processing technique. Based on the query workload, reduced sets of intermediate results (or reductions, for short) that are common for specific join pattern(s) are computed in an online fashion. Also, we introduce an efficient solution for RDF queries with unbound properties. The second contribution is SPARTI, a scalable technique for computing the reductions offline. SPARTI utilizes a partitioning schema, termed SemVP, that enables efficient management of the reductions. SPARTI uses a budgeting mechanism with a cost model to determine the worthiness of partitioning. The third contribution is KC, an efficient RDF data management system for the cloud. KC uses generalized filtering that encompasses both exact and approximate set membership structures that are used for filtering irrelevant data. KC defines a set of common operations and introduces an efficient method for managing and constructing filters. The final contribution is semantic filtering where data can be reduced based on the spatial, temporal, or ontological aspects of a query. We present a set of encoding techniques and demonstrate how to use semantic filters to reduce irrelevant data in a distributed setting.

History

Degree Type

Doctor of Philosophy

Department

Computer Science

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Walid G. Aref

Additional Committee Member 2

Sunil Prabhakar

Additional Committee Member 3

Tiark Rompf

Additional Committee Member 4

Sonia Fahmy

Usage metrics

Keywords

query processing techniques Resource Description Framework (RDF)data management systems Applied Computer Science

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Efficient Query Processing Over Web-Scale RDF Data

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Additional Committee Member 4

Usage metrics

Categories

Keywords

Licence

Exports