Beginner's Guide to Regular Expressions

Why learn regular expressions? Endless amounts of data and sometimes repetitive activities are the daily bread of an SEO specialist. Regular expressions offer an effective way to simplify work and save valuable time.

Regular expressions, or regex, allow you to efficiently search text using special characters. These specific characters (wildcards in English) replace any number of characters in the text and thus search for or replace all possible desired variants.

Regex can help you distinguish branded terms from non-branded ones. You will find performance data only for selected keywords. They help you find specific keywords or their variants on your website. And if you don't have an ideal site structure, they allow you to track traffic and other metrics for only the segment you created. To simplify, regular expressions save you time and help you get at data that you would otherwise have a hard time sorting in Excel, for example.

List of basic characters:

  • Dot - (.) - replaces a character with any unknown character.

  • Vertical line - (x|y) - is a character for the meaning of "or". In this case, all x or y options would be shown.

  • Interval - ({n, k}) - corresponds to n to k repetitions of the previous character, if k is omitted {n, } corresponds to at least n repetitions. If the interval is written in the form {n}, it corresponds to exactly n repetitions of the previous character. An example is the expression Ho{2,6}lka - this expression filters out versions of the word Holka containing exactly 2 to 6 "o" - Hoolka, Hooolka, Hoooolka, etc.

  • Asterisk - (*) - represents any number of repetitions of the previous character (0, ∞) For example, entering the expression eVisio*ns will result in "eVisins", "eVisions" and "eVisiooooons".

  • Round brackets - () - character group.

  • Plus - (+) - represents one or more occurrences of the previous character (1,∞), so if we wanted to find all variants of the word eVisions that contain at least one "o", we would write "eVisio+ns" as follows - the result would be "eVisions", "eVisioons" and "eVisioooooons".

  • Question mark - (?) - represents no or just one occurrence of the preceding character (0,1) - evisio?ns = evisins, evisions.

  • Backslash - (\) - cancels the meaning of the character preceded by the backslash, allowing you to search for punctuation marks.

  • Square brackets - ([]) - filter out all possibilities of characters written in these brackets. The expression is useful when searching for grammatical errors in the text - své[szš]t = seduce, seduzt, svéšt. We can use a hyphen (-) as a range operator. If we use the canopy character ^ in square brackets, it is a negated list and represents all arbitrary characters except those we type in the square brackets.

> TIP: A large number of search patterns can be found by using the correct combination of characters. For example, by combining a dot and an asterisk, we can find all words containing certain letters in any order. For example, ".*o.*a" filters out all words that have the letter "o" somewhere at the beginning and the letter "a" somewhere later.

> To try and learn regular expressions, the regex tester is useful - https://regex101.com/ > > For a more extensive list of regular expressions, see - HERE

Why it is good to use regex in SEO

Let's now take a look at how an SEO consultant can use regular expressions using the tools they use on a daily basis. It is important to segment your data, and if your client has more than 1000 URLs, it is almost impossible to analyze the data manually. A more detailed view will help you find search patterns, opportunities for optimization, and generally give you more insight into your project. We can use regular expressions in the most important tools of an SEO specialist. From Google Analytics to Screaming Frog to OpenRefine. This way we can easily segment our data according to our needs. We can filter brand vs. non-brand traffic, segment the site by landing pages and last but not least we can use regex to analyze keywords in OpenRefine.

Examples of regular expressions

General regex examples

The simplest example to get started with regular expressions is to find the proportion of branded and non-branded expressions. Very often, we find that brand terms are varied and human imagination can sometimes really take anything. Especially with foreign language brands. In such cases, it is sometimes really difficult to filter out all the variations and arrive at specific dates. In the examples below, we will show the possibilities of how to use regular expressions.

In the example below we see 3 differently spelled regular expressions in the name Hoegaarden. Hoeg(a)+rden will show all possibilities where one or more "a" appears in the name. However, it will not show names where the user has not typed any "a".

To help us do this, (*) represents any number of characters written before it. The best expression then is H.*n, which assumes that the user knows the first and last letters, but does not know the order and number of characters in the middle of the word.

Another example is the Volkswagen brand. In the example above, we see two different ways to find a misspelled brand. For the second example, typing (v|w).*gen, we catch multiple misspelled names, then assuming the user types "v" or "w" at the beginning, then arbitrary characters in the middle, and then "gen" at the end.

Google Analytics

Regular expressions can be used here to segment the most important pages such as product pages, categories, blog, etc. Ideally, your site has a logical structure in the format example.cz/categories/... In case this is not the case, you need to group a certain sample of URLs into one segment. By creating your own unique segment, you can then access specific metrics such as bounce rate or conversion rate for just that particular segment. How to approach this?

  • Determine the URL from which we want to create the segment.
  • Find a suitable name for the segment.
  • Create a regular expression.
  • Create a new segment in GA using a regular expression.
  • Verify correctness through the landing page.

Note - For GA, regular expressions can only be used for landing pages, not keywords.

In the image above, we can see how we can segment data by product (landing pages) in Google Analytics to efficiently analyze data as needed. Here we can use the "|" (or) operator to list the main products offered by your website and compare their performance. An example of this regular expression is "example.cz/plastova-okna/|/plastove-dvere/|/hlinikove-dvere/", where we filter the mentioned landing pages.

Google Search Console

GSC provides you with data on how your site is performing in search results. For example, if you're just expanding a certain category on your site and want to look at the performance of specific keywords over time, regular expressions will again help. At GSC, we can use regular expressions for both landing pages and queries. We can segment important pages, products, categories, blog, etc. and get at metrics like clicks, impressions, click-through rates, or average position for specific datasets.

How to approach this?

  • Determine the keywords you want to create a segment from.
  • Create a regular expression.
  • Insert the regular expression into the Queries - Custom (regular expression) section.
  • Verify correctness through the query dimension.

We can use them to filter branded traffic, for example.

In Google Search Console, we can filter only the queries we want (specific datasets), the performance of that specific dataset (clicks, impressions, CTR, visibility), the performance of specific queries from that dataset, the performance of devices, and the distribution of URLs.

Another great use of regex in GSC is filtering branded vs. non-branded traffic. When filtering this way, we can see the ratio and trend of this traffic.

OpenRefine

The foundation of understanding any SEO project is keyword analysis. It will provide a comprehensive idea of which keywords and phrases are relevant to your site. In OpenRefine, we use regex to bulk categorize keywords into given segments, such as different inflections, grammatical errors, and certain search patterns. For example, we can use the range operator [0-9][10-19] to filter out numbers from 0 to 19. This can be useful when filtering keywords by year of manufacture or by serial number and so on. We can also use the "or" operator sv(é|e)(z|s)t when looking for grammatical errors. In this case, we would filter out all the words - drive, seduce, seduce, seduce...

There could be many examples, it's up to you how you approach the filtering.

Another great example of using regex in OpenRefine is filtering spam domains.

If your domain is invaded by spam links, it sometimes takes hours to manually filter out these domains. This method saves up to 2 hours of time.

How to proceed?

  • Download the list of referring domains from Ahrefs.
  • Paste this list into OpenRefine.
  • Create a column to mark unwanted domains.
  • Put your list of spam domains into the filter and check the regular expression box.
    • \.cn|\.top|.blogspot|.xyz|.asia|.icu|.online|.io|.cool|.id|.site|.bd|.website|.in|.link|\.host|\\.au|\.group|\.fun|\.sa|\.in|\[0-9]|\.store|\.biz - we don't want these domains in our link portfolio 95% of the time. I recommend to manually check SK, EU, COM and CZ domains with domain rating lower than 20. (\b([09]|1[0-9]|20)\b) is a regex for the range 1-20, where "\b" indicates the beginning and end of the range. For more information on number range regular expressions, see here.
  • Mark the filtered data for subsequent filtering/elimination.

Don't forget to manually review the resulting list for distancing. Some domains look dubious at first glance, but after a thorough check we may find valuable links. (These are mainly domains with the ending .info, .biz, .blogspot.com.)

Screaming Frog

In this tool we can use regular expressions for several activities, such as:

  • finding opportunities for internal linking
  • crawling only certain segments
  • finding specific elements in the code, etc.

One example would be finding internal linking opportunities on the web. The easiest way to do this is to create a regular expression of the keywords you want to find on the site and use custom search to find specific phrases.


At first glance, learning regular expressions seems complicated, but once you discover their power and try them out in practice, you'll find that you can't do without them.

Topics on the blog