Introduction to string indexing
Last updated
Last updated
In this chapter we'll focus on different approaches used to index strings or collections of strings such that subsequent queries can be performed more efficiently. When discussing string matching in the context of the Z and KMP algorithms, the Z and sp values were a form of indexing, but the indices were constructed for each pattern separately. More specifically, assume you have a text of length and a pattern of length . The run time required to find inside is for either the Z algorithm or KMP (if this doesn't make sense, please read the chapter again).
Now, let's assume we have k different patterns: . If we try to search them against , we incur a cost of for each pattern , leading to a total runtime of for all patterns. We can't easily avoid the second part of this cost () as we have to look at every character in each pattern. But can we avoid going through the text times? The indexing approaches described in this chapter will help us do so, and also provide some additional useful functionality.