Introduction to string indexing

Introduction

In this chapter we'll focus on different approaches used to index strings or collections of strings such that subsequent queries can be performed more efficiently. When discussing string matching in the context of the Z and KMP algorithms, the Z and sp values were a form of indexing, but the indices were constructed for each pattern separately. More specifically, assume you have a text $T$ of length $n$ and a pattern $P_1$ of length $m_1$ . The run time required to find $P_1$ inside $T$ is $O(m_1 + n)$ for either the Z algorithm or KMP (if this doesn't make sense, please read the chapter again).

Now, let's assume we have k different patterns: $P_1,..., P_k$ . If we try to search them against $T$ , we incur a cost of $O(m_i + n)$ for each pattern $i$ , leading to a total runtime of $O(nk + \sum_i m_i)$ for all $k$ patterns. We can't easily avoid the second part of this cost ( $\sum_i m_i$ ) as we have to look at every character in each pattern. But can we avoid going through the text $k$ times? The indexing approaches described in this chapter will help us do so, and also provide some additional useful functionality.

PreviousMotif finding NextIntroduction to suffix trees

Last updated 6 months ago