Periodicity of strings
This is a quick introduction to several properties related to the periodicity of strings. Some of these concepts can be useful in understanding the performance of string algorithms, particularly being used to prove that the Boyer-Moore algorithm runs in linear time.
Definition 1. A string S is periodic if S=BBBBB..B for some shorter string B. The length of B is called the period of the string.
Definition 2. A string S is suffix semiperiodic if S=BBBB..Bα for some prefix α of B.
Definition 3. A string S is prefix semiperiodic if S=αBBBB..B for some suffix α of B.
It may not be immediately obvious but any string that is prefix semiperiodic is also suffix semiperiodic with the same period. To prove this statement, let's assume S=αBBBB where α is a suffix of B. Thus B can be written as B=βα, and the full string can be rewritten as: S=αβαβαβαβα. By defining C=αβ (the string B with its prefix and suffix swapped around), we can rewrite S=CCCCα, i.e., S is suffix semiperiodic. The period is obviously the same since the lengths of B and C are identical (∣α∣+∣β∣).
It is also possible to show that if a string is semiperiodic with two different periods of lengths p1 and p2 it is also semiperiodic with the greatest common denominator of the two periods GCD(p1,p2).
Question: Prove the claim made in the previous paragraph: A string that is semiperiodic with two periods, it's also semiperiodic with the greatest common denominator of the two periods.
So, what does periodicity have to do with sequence matching? Assume that a pattern P of length m matches at two overlapping positions in the text, t1 and t2, and that these positions are spaced by less than half the length of the pattern (∣t1−t2∣<⌊2m⌋ ). In this case, we can show that P is semiperiodic with period ∣t1−t2∣. To prove this claim you can draw two instances of the pattern aligned to each other such that they overlap by more than 50% of the pattern's length, then carefully map which segments of the pattern match each other, as inferred from the overlapping region.
Question: Prove the claim made in the previous paragraph, rephrased as: If a string overlaps itself by more than half of its length (a prefix of the string longer than half its length perfectly matches a corresponding suffix), then the string is semiperiodic with a period equal to the length of the string not contained in the overlap.
This property can be useful when reasoning about the behavior of string matching algorithm, and is used, for example, to prove that the Boyer-Moore algorithm runs in linear time.
Last updated