Performance: The Fastest Way to Use Regular Expressions in Microsoft .NET 7

Regular expressions are a powerful tool in the programming world, allowing developers to efficiently search, parse, and manipulate text. However, they can also be a source of frustration, as the syntax and patterns can be complex and difficult to remember. In this article, we will explore the various ways to use regular expressions and provide practical examples to help demystify this topic. Additionally, we will share tips for optimizing regular expression performance based on the latest best practices. If you want to improve your skills in this area, check out my book “Rock Your Code: Code & App Performance for Microsoft.NET” available on Amazon.com.

If youā€™ve never used a regular expression, this is a description:

A regular expression, often abbreviated as “regex” or “regexp”, is a sequence of characters that define a search pattern. Regular expressions are used to search for and match patterns in text and manipulate text based on those patterns. They can be used in a variety of programming languages, tools, and applications to perform tasks such as validation, data extraction, and text manipulation. Regular expressions are made up of a combination of characters, including literal characters, metacharacters, and quantifiers, which define the rules for the pattern to be matched.

The concept of regular expressions can be traced back to the 1940s when mathematician Stephen Kleene introduced the concept of regular sets and regular languages. In the 1960s, Ken Thompson, a computer scientist at Bell Labs, developed the first implementation of regular expressions as part of the QED text editor. The regular expression syntax was later standardized and popularized in the Unix world by tools such as grep, sed, and awk.

In the Beginning

Using regular expressions has been in .NET ever since the first version. The first part of a regular expression is to come up with the pattern that will be used for matching or replacing strings. For example, this is the pattern that I used to ensure a string contains a word.

\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

This is how we use that pattern in code.

public static bool ContainsWord(string input)
{
     var expression = new Regex(
         pattern: @"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
         options: RegexOptions.CultureInvariant);

     return expression.IsMatch(input);
}

Then Came a Better Way

Regular expressions are fast but recently another way came along in .NET to use them to increase performance. The ā€œmagicā€ is that now we use them from a field in a class. First, we move the call to a field like this.

private static readonly Regex _containsWordRegEx = 
    new(pattern: @"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*", 
    options:  RegexOptions.CultureInvariant);

Then we use the field instead of creating a new Regex object whenever needed.

public static bool ContainsWord(string input) =>
    _containsWordRegEx.IsMatch(input);

As my code performance book described, this dramatically increases the performance (see benchmark results below).

Using the Regex Source Generator in .NET 7!

Source generators were introduced in .NET 5. With the release of .NET 7, the team added a source generator to increase performance even more for regular expressions. They must be used in a partial class along with the GeneratedRegex attribute.

public static partial class RegexExamples
{
    [GeneratedRegex(@"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
        RegexOptions.CultureInvariant)]
    private static partial Regex ContainsWordRegex();
}

This pattern along with the generator produces almost 1,000 lines of code! I canā€™t show it here, but itā€™s all the code you could have written manually. I doubt any manager would give you the time to write and test this code! Since I like good documentation, it even adds helpful info for Intellisence.

REGEX-SOURCE GENERATOR-INTELLISENSE

Then I created a method to use the source generator code.

public static bool ContainsWord(string input) => 
                       ContainsWordRegex().IsMatch(input);

Benchmark Tests

But is using a source generator faster? Well, letā€™s look at the performance for all three of these ways to use a regular expression.

REGEX-SOURCE GENERATOR-CONTAINS WORD-CHART

As you can see, using the source generator is 7 times faster than using a field and over 35 times faster than the normal way of coding regular expressions! Also, using the generator or field allocates zero bytes in memory while the normal way allocates 6,696 bytes.

Now letā€™s look at the performance using a regular expression to find spaces in a string using “\s+” as the pattern so the spaces can be replaced.

REGEX-SOURCE GENERATOR-REPLACE SPACES-CHART

This shows that using the source generator is 1.63 times faster than the field and 1.85 times faster than using the normal way. The generator and field allocate 1,960 bytes in memory while the normal way allocates 4,536 bytes.

After reading this, are you going to refactor all your code that uses regular expressions?

Caution

I’d like to share a few quarks when using this generator. I have found using the RegexOptions.Compiled option wipes out almost all the performance gain with the generator. After looking at the code it generates, itā€™s using that option anyway.

The second thing I found while working on the source generator method, I kept seeing messages like this one that state, ā€œPartial method must have an implementation part because it has accessibility modifiers.ā€.

REGEX-SOURCE GENERATOR-WARNING

I kept thinking something was wrong, but the code needs to be regenerated. Just choose Build ā€“ Clean, then Build. That will clear it up. Not sure if all generators do this, but this one does.

Finding Regular Expression Patterns

Regular Expression Twitter PollRegular expressions can be a powerful tool for text manipulation, but some software engineers find them difficult to use due to the complex patterns involved. However, there are resources available to make the process easier, such as websites like regex101.com. Another option is using AI-powered tools like ChatGPT, which can quickly generate regular expressions based on input. With these resources at your disposal, you can make the most of regular expressions and streamline your text-processing tasks. For example, I asked it this question:
ā€œCan you write a regular expression that finds all the http and https links in a document that ends in .com? Provide C# code as the example of how to use it.ā€

Below is the answer:

REGEX-SOURCE GENERATOR-CHATGPT

I have to say, Iā€™m impressed with the answer. I will be using ChatGPT to create regular expression patterns in the future since itā€™s easier and faster!

Summary

To summarize, optimizing regular expressions in .NET 7 can significantly improve the performance of your code. By following the tips mentioned in this article, you can ensure that your regular expressions are processed efficiently and avoid potential performance bottlenecks. Remember to benchmark your code to ensure you are getting the most out of your optimizations.

Do you have any experiences or questions related to optimizing performance in .NET? Please share in the comments below, I’d love to hear from you.

Pick up any books by David McCarter by going to Amazon.com: http://bit.ly/RockYourCodeBooks

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

If you liked this article, please buy David a cup of Coffee by going here: https://www.buymeacoffee.com/dotnetdave

Ā© The information in this article is copywritten and cannot be preproduced in any way without express permission from David McCarter.

One thought on “Performance: The Fastest Way to Use Regular Expressions in Microsoft .NET 7

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.