Purdue University Graduate School
Browse
thesis.pdf (1.35 MB)

Reconstruction and Local Recovery of Data from Synchronization Errors

Download (1.35 MB)
thesis
posted on 2023-04-21, 17:48 authored by Minshen ZhuMinshen Zhu

In this thesis we study the complexity of data recovery from synchronization errors, namely insertion and deletion (insdel) errors.

Insdel Locally Decodable Codes (Insdel LDCs) are error-correcting codes that admit super-efficient decoding algorithms even in the presence of many insdel errors. The study of such codes for Hamming errors have spanned several decades, whereas work on the insdel analogue had amounted to only a few papers before our work. This work initiates a systematic study of insdel LDCs, seeking to bridge this gap through designing codes and proving limitations. Our upper bounds essentially match those for Hamming LDCs in important ranges of parameters, even though insdel LDCs are more general than Hamming LDCs. Our main results are lower bounds that are exponentially stronger than the ones inherited from the Hamming LDCs. These results also have implications for the well-studied variant of relaxed LDCs. For this variant, besides showing the first results in the insdel setting, we also answer an open question for the Hamming variant by showing a strong lower bound.

In the trace reconstruction problem, the goal is to recover an unknown source string x \in {0,1}n from random traces, which are obtained by hitting the source string with random deletion/insertions at a fixed rate. Mean-based algorithms are a class of reconstruction algorithms whose outputs depend only on the empirical estimates of individual bits. The number of traces needed for mean-based trace reconstruction has already been settled. We further study the performance of mean-based algorithms in a scenario where one wants to distinguish between two source strings parameterized by their edit distance, and we also provide explicit construction of strings that are hard to distinguish. We further establish an equivalence to the Prouhet-Tarry-Escott problem from number theory, which ends up being an obstacle to constructing explicit hard instances against mean-based algorithms.

Funding

NSF CCF-1910659

History

Degree Type

  • Doctor of Philosophy

Department

  • Computer Science

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Elena Grigorescu

Additional Committee Member 2

Jeremiah Blocki

Additional Committee Member 3

Anuran Makur

Additional Committee Member 4

Paul Valiant

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC