Why doesn’t b match word boundaries correctly?

A colleague of mine was having trouble getting the \b metacharacter in a regular expression to work. Of course, when somebody asks a question like that, you first have to establish what their definition of "work" is. Fortunately, he provided some examples:

Regex.IsMatch("foo", @"\b" + @"foo" + @"\b") true
Regex.IsMatch("%1" , @"\b" + @"%1"  + @"\b") false
Regex.IsMatch("%1" , @"\b" + @"\%1" + @"\b") false
Regex.IsMatch("%1" , @"\b" + @"\%1" + @"\b") false
Regex.IsMatch("%1" , @"..") true
Regex.IsMatch("%1" , @"%1") true

"The last two entries are just sanity checks to make sure I didn't make some stupid mistake like passing the parameters in the wrong order. I want to search for a string that contains %1 with word boundaries on either side, something I would normally use \b for. Is there something special about the % character? Notice that the match succeeds when I look for the word foo."

Everything is working as it should. Recall that the \b metacharacter matches when there is a \w on one side and a \W on the other, where the beginning and end of the string are treated as if they were \W.

The string %1 therefore breaks down as

virtual \W  beginning of string
\W  % is not an alphanumeric or _
\w  1 is a digit
virtual \W  end of string

The only points where \b would match are immediately before and after the 1, since those are the transition points between \w and \W and vice versa. In particular, the location immediately before the percent sign does not match since it is surrounded by \W on both sides.

My colleague responded, "D'oh! I keep forgetting that % won't act like a \w just because I want it to."

Comments (21)
  1. Adam Rosenfield says:

    When grepping, I prefer to use the < and > symbols instead of b, which match word boundaries at the start and end of a word respectively, but it seems like C# doesn't provide those.

    Also of note: b is interpreted as the backspace character (U+0008) when inside a character class, but as a word boundary when outside of a character class.

  2. Toddsa says:

    I have a love hate relationship with regex. I love it, but damn do I not have a gotcha moment almost every time I need a complex match. But then again I do not tend to use regex with great frequency.

  3. configurator says:

    Am I right that the correct regex he wanted was @"B%1b"?

  4. David Walker says:

    Regex is something I have never been able to wrap my head around.  It was designed, and it's understood, by people whose brains work differently than mine.  Not better or worse, just differently.  :-)

  5. Joshua Ganes says:

    Regular expressions can be a very useful tool. In simple matches or one-off scripts it can be a life saver. I wish a curse upon any developer who uses a complicated expression without at least a dozen comment lines above to explain its purpose.

  6. Gabe says:

    In your 6-item table, the 3rd and 4th items appear to be exact duplicates. Was that intended?

    [Probably not, but I can't tell whether it was a duplicate in the original mesasge or a duplicate I introduced (since I don't have the original message any more – didn't realize there would be a quiz two years later). -Raymond]
  7. Mason Wheeler says:

    You really ought to tag this one "And Now They Have Two Problems".

  8. James Schend says:

    @David Walker: I'm with you. Add in the fact that most of the things that people use RegEx for (validating URLs or emails) are difficult-to-impossible to do correctly in RegEx, and I tend to just avoid it altogether.

    (The really sad part is how few developers actually care whether their email validator follows the RFC specs or not… but that's another story.)

  9. Am I right that the correct regex he wanted was @"B%1b"?

    Probably, but it depends what the developer really wants.

    Yes, B is the opposite of b (at least in .NET regular expressions.)

    You probably wouldn't want to match %%1 though, which is batchese for a literal %1.

  10. Krunch says:

    This may be closer to what the requester wanted:


    Or even


  11. You probably /would/ want to match %%%1 though.  This doesn't really look like a job for regular expressions; Environment.ExpandEnvironmentVariables might be better suited.

  12. JB says:

    The most useful I've found regex is for capturing repeating patterns.

    Given most text or html with repeating patterns it's very easy to write a regex to pull out the data you want and then you've cleaned your input and have a tabular output.

    Think web scrape or log file parsing.

    I have a tool just for this which I use to clean text and export into excel.

  13. @Krunch don't forget the beginning and end of the string (or line):


    It would be really cool if you could define your own character classes.

  14. > I can't tell whether it was a duplicate in the original mesasge or a duplicate I introduced

    It doesn't matter *why* they're dressed as a tiger – have they got my leg?

  15. Cheong says:


    I'll probably just test it twice, first with @"/s%1s/" and if it don't match, test with StartWith() and EndWith() respectively.

    RegEx is too difficult for me to get correctly. I'll just use it within the extent of what I can understand.

  16. Simon says:


    – I think *everyone* has a love/hate relationship with regexes. They're a stupendously powerful tool, but they're also a real headache to work with.

    The trick is often to know when to use them, and when not to. They're often used when a simple (if slightly longer) combination of indexOf/substring would be cleaner. But conversely, you also see people doing everything they can to avoid using them – ending up with pages of complicated text parsing code that could have been done much more easily with a regex or two.

  17. Joseph Koss says:

    For those that don't know, Mason Wheelers comment was referring to a quote from Jamie Zawinski (developer on Mozilla/Netscape, XEMacs, and XScreenSaver) who said:

    "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."

  18. I've heard someone say the Regex is really a Write Only language – a sufficently complicated one is too hard to be read back and understood(!)

  19. Also, isn't Raymond ignoring the first 2 rules of the internet?

    Do not talk about b

    Do NOT talk about b

  20. Gabe says:

    Michael Kaplan covered this same topic some time ago. blogs.msdn.com/…/9056364.aspx

  21. ender says:

    @Joseph Koss: didn't Zawinski actually say awk and not regular expressions?

Comments are closed.