Regular languages and regular expressions

Since we are interested in describing languages as sets of strings, and since these sets can be infinite in size, we need a compact mechanism for precisely describing the languages.

During the semester we will examine several description approaches of varying compexity - the first of these approaches is called regular expressions, and the languages it allows us to describe are called regular languages.

The easiest way of describing the set of all regular languages is by describing the ways you can "build up" a regular language.

Definition: for any alphabet, , the set of regular languages over is as follows:

  1. The empty set (i.e. the language which has no strings in it) is a regular language.

    The regular expression describing that language is ø

  2. The language which consists of exactly the null string is a regular language, i.e. { ñ }

    The regular expression describing that language is ñ

  3. For each character in , the language which consists of one string which is exactly that character in a regular language, i.e. ∀a ∈ ∑, { a } is a regular language

    The regular expression describing that language is a

  4. If L1 and L2 are regular languages, and r1 and r2 are their corresponding regular expressions, then

Only those languages that can be obtained using statements 1-4 are regular languages over

Handy shortcuts and extra notation

Even though the above gives the formal definition of regular languages, there are two more notation items that make descriptions easier to read:


Practice with regular languages and regular expressions

  1. Observation: any finite set of strings over an alphabet is a regular language over that alphabet, and could be constructed as a regular expression that simply lists all the strings.

    For example, if our alphabet is { a, b, c }, then the set of all strings of length two is finite, and could be listed with the regular expression
    (aa + ab + ac + ba + bb + bc + ca + cb + cc)

    (Of course, a more compact representation would be something like (a + b + c)2)

  2. Is the set of strings of even length a regular language?

    Yes, since we can obtain it with the regular expression (∑2)*

  3. Is the set of strings of odd length a regular language?

    Yes, since we can obtain any string in the language by adding a single character to some string of even length, and we know the even length strings form a regular language, i.e.: (∑2)*

  4. For the alphabet { 0, 1 }, is the set of all strings containing the substring 1001 a regular language?

    Yes, since we can create the language with the following regular expression: (0 + 1)* 1001 (0 + 1)*

  5. For the alphabet { 0, 1 }, is the set of strings of length at most 100 a regular language?

    Yes, since we could represent it by enumerating all the strings of length at most 100, although a much more compact notation would be (0 + 1 + ñ)100

  6. For the alphabet { 0, 1 }, is the set of strings representing powers of two (expressed as binary integers) a regular language?

    Yes, since we could represent the language with the regular expression 10*

  7. For the alphabet { a, b }, is the set of strings which contain no consecutive a's a regular language?

    Yes, since we could represent the language with the regular expression b*(ab+)(a+ñ)

  8. Over the alphabet { a, b }, is the set of strings which contain at least as many a's as b's a regular language?

    In fact, it is not - a proof of this will be considered in several lectures, but there is no regular expression that can be used to describe this language.

Some practice questions

For each of the following questions assume the alphabet is { 0, 1 }

  1. Find a string not in the language described by the regular expression (0*+1*)(0*+1*)(0*+1*)

  2. Find a string not in the language described by the regular expression 0*(100*)*1*

  3. Simplify the following regular expression 01((01)*01+(01)*)+(01)*

  4. Give a simple description of the language described by the following regular expression 0*1(0*10*1)*0

  5. Give a simple description of the language described by the following regular expression (0+1)* (0+1+0+ + 1+0+1+) (0+1)*

  6. Give a regular expression for the language of all strings not ending with 01

  7. Give a regular expression for the language of all strings in which each 0 is followed immediately by 11

  8. Give a regular expression for the language of all strings containing both 11 and 010 as substrings

Sample solutions

  1. Find a string not in the language described by the regular expression (0*+1*)(0*+1*)(0*+1*)
    i.e. any string that is more than 3 characters

  2. Find a string not in the language described by the regular expression 0*(100*)*1*
    i.e. any string containing either consecutive 1's or ending with a 0

  3. Simplify the following regular expression 01((01)*01+(01)*)+(01)*
    i.e. (01)*

  4. Give a simple description of the language described by the following regular expression 0*1(0*10*1)*0
    i.e. language of all strings containing an odd number of 1's and ending with 10

  5. Give a simple description of the language described by the following regular expression (0+1)* (0+1+0+ + 1+0+1+) (0+1)*
    i.e. the language of all strings containing both 01 and 10 as substrings

  6. Give a regular expression for the language of all strings not ending with 01
    i.e. (0 + 1)*(00 + 10 + 11)

  7. Give a regular expression for the language of all strings in which each 0 is followed immediately by 11
    i.e. 1*(011+)*

  8. Give a regular expression for the language of all strings containing both 11 and 010 as substrings
    i.e. (0+1)*11(0+1)*010(0+1)* + (0+1)*010(0+1)*11(0+1)*