Posted on

Lexicographic Order The usual way in which two character strings are compared is according to their lexicographic order. Let c1c2 · · · ck and d1d2 · · · dm be two strings, where each of the c’s and d’s represents a single character. The lengths of the strings, k and m, need not be the same. We assume that there is a < ordering on characters; for example, in C characters are small integers, so character constants and variables can be used as integers in arithmetic expressions. Thus we can use the conventional < relation on integers to tell which of two characters is “less than” the other. This ordering includes the natural notion that lower-case letters appearing earlier in the alphabet are “less than” lower-case letters appearing later in the alphabet, and the same holds for upper-case letters. We may then define the ordering on character strings called the lexicographic, dictionary, or alphabetic ordering, as follows. We say c1c2 · · · ck < d1d2 · · · dm if either of the following holds: Proper prefix 1. The first string is a proper prefix of the second, which means that k < m and for i = 1, 2, . . . , k we have ci = di . According to this rule, bat < batter. As a special case of this rule, we could have k = 0, in which case the first string has no characters in it. We shall use ǫ, the Greek letter epsilon, to denote the Empty string empty string, the string with zero characters.

When k = 0, rule (1) says that ǫ < s for any nonempty string s. 2. For some value of i > 0, the first i − 1 characters of the two strings agree, but the ith character of the first string is less than the ith character of the second string. That is, cj = dj for j = 1, 2, . . . , i − 1, and ci < di . According to this rule, ball < base, because the two words first differ at position 3, and at that position ball has an l, which precedes the character s found in the third position of base.

𐌢