Regular Expressions From Scratch, Part Four: The Kleene Closure of a Language

Languages are sets, so we can take any two languages (over the same alphabet) and take their union to form a new language. Just as a reminder:

Definition 7: The union of two sets L and K is the set with all the members found in either, and is written L ∪ K.

We're going to take the Kleene Star one level further and say that it applies to languages too.

Definition 8: The Kleene Closure of a language L is L* = { e } ∪ L ∪ LL ∪ LLL …

Got it? L* is the language consisting of set of strings, where those strings are any number of strings in L concatenated together.

I want to extend our notion of concatenation in one more way. When I write a string or string variable next to a language, I mean "concatenate the language onto the end of the language that consists of only this string". So when I say abLa I mean {ab}L{a} Again, you can infer the braces from the context.

Let's do an example of using * and string concatenation of languages, just so its clear.

S = {0, 1}
L = {011*}
1L* = {1, 101, 1011, 101101, 1011011, …}

Or, in English, this is the set of strings w in S* such that w:

  • starts with a single 1,
  • has no more than one 0 in a row, and
  • ends with 1

We need one more thing to make this notation sufficiently terse. You probably noticed that the Kleene Star above only applied to the thing immediately to the star's left, whether the language or the symbol. Suppose we've got four languages: H, J, K and L. We can form two new languages through concatenation: G = HJ and F = KL. And then we can form a seventh language, D = G*F*. But we can't write D = HJ*KL*, because that's a different language. What we mean is D = (HJ)*(KL)*.

Therefore I'm declaring that we can use parenthesis in the normal way we're all used to for order of operations. We will figure out from context whether parens mean "this is a finite sequence" or "this is a grouping of concatenations".

Remember, we're looking for a way to concisely characterize languages, and clearly we're onto something here. We now have union, grouping, concatenation and Kleene closure. In other words, we have enough tools to define regular expressions, which we'll do next time. Stay tuned!

Comments (8)

  1. Attentive says:

    Just so you know we are out here!

    Thanks for this

  2. Jeff Mayeur says:

    Second that thanks… pretty cool… It’s still a few miles over my head, but I’d have to say, it’s making regexp much more understandable when viewing the concept from sets plateau… very cool.


  3. Shawn Cheng says:

    Hello Eric, I don’t understand what you mean by "has no more than one 0 in a row". What row? Do you mean a member of 1L*? As the example shows that we can have more that one 0…

  4. Jon Turner says:

    Shawn, I think he means that there is no more than one consecutive 0. So you can not have any strings that contain 00 or 000, but they could contain 1010, etc.

  5. When it comes to validating input regular expression becomes a very important part of your security plan. …

  6. Radu Poenaru says:

    Woldn’t you need an aditional rule there for 1L* ?  10101 starts with single 1, has no more than one 0 in a row and ends in 1, but does not belong in 1L*.

    How about adding:

    – there are always two 1s between 0s ?

  7. Aditya says:

    You are awesome, in the above topic  you explained in a way every one could understand using few words! Perhaps i will call it best explanation of the topic!

Skip to main content