Regex 101 Answer S5 – Strip out any non-letter, non-digit characters

Remove any characters that are not alphanumeric.


To remove these characters, we will first need to match them. We know that to match all alphanumeric characters, we could write:


To match all characters except these, we can negate the character class:


It’s then simple to use Regex.Replace():

string data = …;

Regex regex = new Regex(“[^a-zA-Z0-9]”);

data = Regex.Replace(data, “”);

Another way of doing this would be to use the pattern:


and then create the regex using RegexOptions.CaseInsensitive.

Note: I’ve seen a few comments referring to Unicode and international characters. I haven’t delved into that because I don’t want to complicate the discussion, and, frankly, Unicode scares me. If you want the details, you can find them in the docs. For example, you can find out that \W is really equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}].

Comments (5)

  1. Just curious, couldn’t you condense the pattern further to this:


    Since d is the same as 0-9?

  2. ericgu says:


    Yes, you could, though strictly d is not equal to [0-9] but [p{Nd}], the unicode equivalent. Probably fine in most cases.

  3. MaherJ says:

    Do you know a regular expression to remove html tags from a string?

  4. Maurits says:

    MaherJ what I usually do is create a CDO Message object, set the .HTMLBody property to the html string, and read the text equivalent from .TextBody

  5. michkap says:

    Of course if you do not use Unicode categories then there are (e.g.) many digits not being included. 🙂