Combinatorial methods for counting pattern occurrences in a Markovian text
thesisposted on 2020-12-16, 16:03 authored by Yucong ZhangYucong Zhang
In this dissertation, we provide combinatorial methods to obtain the probabilistic mul-tivariate generating function that counts the occurrences of patterns in a text generated by a Markovian source. The generating function can then be expanded into the Taylor series in which the power of a term gives the size of a text and the coeÿcient provides the proba-bilities of all possible pattern occurrences with the text size. The analysis is on the basis of the inclusion-exclusion principle to pattern counting (Goulden and Jackson, 1979 and 1983) and its application that Bassino et al. (2012) used for obtaining the generating function in the context of the Bernoulli text source. We followed the notations and concepts created by Bassino et al. in the discussion of distinguished patterns and non-reduced pattern sets, with modifications to the Markovian dependence. Our result is derived in the form of a linear matrix equation in which the number of linear equations depends on the size of the alphabet. In addition, we compute the moments of pattern occurrences and discuss the impact of a Markovian text to the moments comparing to the Bernoulli case. The methodology that we use involves the inclusion-exclusion principle, stochastic recurrences, and combinatorics on words including probabilistic multivariate generating functions and moment generating functions.