regex: untracked capture groups

2020-09-30

 | 

~3 min read

 | 

405 words

When I write regular expressions, I often find myself grouping logic between parenthesis. The issue is that these groups are remembered and returned as part of the match groups. Sometimes I don’t actually care about them and the group itself is more for the logic of the match, but I only want to focus on a few pieces of overall matched string.

Named capture groups are a great step in that direction, but they can be finicky and require specifying the name of the group. What if you just want to ignore a group?

Let’s return to our pattern to match a Markdown Link to demonstrate this.

First, let’s look at the “basic” regular expression and build up from there1:

const pattern =
  /^(\[([^\]]*)?\]\(((https?)?[A-Za-z0-9\:\/\.\- ]+)(\"(.+)\")?\))/

One of the first things we want is to extract the text that’s going to live inside the brackets ([]) at the front. One approach is to use a named capture group so that we know exactly what we’re looking at (we’ll call it title):

const pattern =
  /^(\[(?<title>[^\]]*)?\]\(((https?)?[A-Za-z0-9\:\/\.\- ]+)(\"(.+)\")?\))/

Named capture groups are still pretty new (ES2018), though browser support seems to be mostly there at this point.

An alternative, and frankly less desired way since it’s much less declarative in nature, is to hide other capture groups.

For example, using the ”Non-capturing groups”, indicated by a ?: at the beginning of a group, we can ignore all of other groups and just return the title (assuming we have a match at all):

const pattern =
  /^(?:\[([^\]]*)?\]\((?:(https?)?[A-Za-z0-9\:\/\.\- ]+)(?:\"(.+)\")?\))/

In the above example the only capture group that is not ignored is ([^\]]*). In this way we know that if there’s a match and if there’s something returned, then it’s going to be the title (i.e. what we’re calling the title, which are the characters that live between the brackets).

I don’t know that you always need to use non-capture groups, but I find them useful. They can certainly come in handy because the use of groups is so helpful in breaking up the logic of a pattern into bite size pieces, but it comes at a cost of cluttering up the returned matches. Non-capture groups solve that by stripping away that baggage.

Footnotes

  • 1 I’m calling it basic only because I’ve now stared at it extensively for several days. My other RegEx posts will help break down how it works.


Hi there and thanks for reading! My name's Stephen. I live in Chicago with my wife, Kate, and dog, Finn. Want more? See about and get in touch!