Regex 101 Discussion I2 - Find two words in a string

I2 - Find two words in a string

Find any string that has the following two words in it: “dog” and “vet”

******

This is an interesting one, since it's not something that regex is particularly suited for. The test strings that I'm using are:

I took my dog to the vet
The vet fixed my dog
My dog likes to visit veterans
dog dog
The vet is great
He continued with dogged determination

The first two should be successful, all others should fail.

In the comments to the original post, Maurits said that you should use two regexes. I think that it may be the best solution (clearest and easiest), though it may be less performant. But I'm going to talk about the single-regex solution.

The only tricky thing about this is that we need to match words rather than characters. To do that, we can write:

\sdog\s

to find a dog surrounded by whitespace (please spend two minutes, think up the best joke you can having to do with "dog surrounded by whitespace", and post it as a comment). Unfortunately, if I try to match that to:

I am going to walk my dog

it fails, because there's no whitespace after "dog". What we need is a way to match between a word and non-word. We can use that with "\b", so if we write:

\bdog\b

we will get the behavior that we want. Two quick notes:

  1. Like the $ and ^ anchors, \b doesn't consume any characters, it just asserts what condition must be true to match.
  2. The boundary is really between alphanumeric and non-alphanumeric characters.

So, time to string things together. We can match a sentence with dog followed by vet with the following:

\bdog\b.*?\bvet\b

That handles one case, and to handle the other case, we'll just switch the order. Finally, we get:

\bdog\b.*?\bvet\b
|
\bvet\b.*?\bdog\b

which does what we want it to do, assuming we use RegexOptions.IgnoreCase when we use it.

That's all for now. The next one is a nice one, but it will have to wait until next year...