Understanding Regex

Photo by JJ Ying on Unsplash

Understanding Regex

A very simple introduction to Regex.

In this short article, I will take you through the key underlying concepts behind Regex, what it is, and how to get writing Regex in no time ;)

What is Regex?

So what in heaven's name is Regex? Regex or Regular Expression is used for string matching or pattern making. We use Regex for email validation, text manipulations, and password validation.

Basic Syntax of Regex

[abc] - a, b or c

[^abc] - any character except a, b, c

[a-z] - any character between a-z can be used or picked up

[A-Z] - any character between uppercase A-Z can be used or picked up

[a-z A-Z] - any character can be picked be it a lowercase, uppercase

[0-9] - any digits between 0 -9 can be picked up.

Regex Quantifiers

Regex Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match to be found. Primarily there are two types of quantifiers namely the greedy and lazy quantifiers but by default all regex quantifiers are greedy.

Greedy QuantifiersLazy QuantifiersDescription
[....]+[....]+?Matches 1 or more times
[....]?[....]??Matches 0 or 1 times
[....]*[....]*?Matches 0 or more times
[....]{n}[....]{n}?Matches n times
[....]{n ,}[....]{n ,} ?Matches n or more times
[....]{a, b}[....]{a, b} ?Matches from at least a times but less than b times

Greedy Quantifiers

Greedy quantifiers try to match an element as many times as possible.

Lazy Quantifiers

Lazy quantifiers try to match elements as few times as possible. To turn a greedy quantifier into a lazy quantifier you must add a ?


Regex Metacharacters

Metacharacters are special characters that have specific meanings which can be used to build complex patterns that can match a wide range of combinations. Metacharacters are used in regular expressions to define search criteria and allow us developers to do text manipulations.

.

Any single character

^

Match the beginning of a line

$

Match the end of a line

a|b

Match either a or b

\d

any digit

\D

Any non-digit character

\w

Any word character

\W

Any non-word character

\s

matches any whitespace character

\S

Match any non-whitespace character

\b

Matches a word boundary

\B

The match must not occur on a \b boundary.

[\b]

Backspace character

\xYY

Match hex character YY

\ddd

Octal character ddd

Always remember the backslash / allows the computer to search for characters or take a character literally.

Example 1

Write a regex to check for a student Id that has a total number of 10 digits and start with either a 5 or 6.

The answer to this is /^[56]\d{9}$/

The explanation for the answer is

  • ^: Asserts the start of the string.

  • [56]: Matches either 5 or 6.

  • \d{9}: Matches exactly 9 digits (0-9).

  • $: Asserts the end of the string.

const studentIdRegex = /^[56]\d{9}$/;

function checkStudentId(studentId) {
  return studentIdRegex.test(studentId);
}

// Test cases
console.log(checkStudentId("5123456789")); // true
console.log(checkStudentId("6123456789")); // true
console.log(checkStudentId("4123456789")); // false (doesn't start with 5 or 6)
console.log(checkStudentId("51234567"));   // false (length is less than 10 digits)
console.log(checkStudentId("61234567891"));// false (length is more than 10 digits)

Example 2

As a developer, you have built a signup form for your website and you want your users to signup with an email that contains the parameters of a registered domain. Write a regex to match the input the user provides.

The answer to the question is ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$

Explanation of the regex:

  • ^: Denotes the start of the string.

  • [a-zA-Z0-9._%+-]+: Matches one or more occurrences of letters (both uppercase and lowercase), digits, and special characters ._%+- that are allowed in the local part of the email address.

  • @: Matches the '@' symbol.

  • [a-zA-Z0-9.-]+: Matches one or more occurrences of letters (both uppercase and lowercase), digits, and the period '.' and hyphen '-' characters that are allowed in the domain name.

  • \.: Escapes the period '.' character to match it literally.

  • [a-zA-Z]{2,}: Matches two or more occurrences of letters (both uppercase and lowercase) for the top-level domain (TLD). This ensures that the domain name ends with a valid TLD like ".com", ".org", ".net", etc.

  • $: Denotes the end of the string.

const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;

// Example usage:
const email1 = "user@example.com"; // Valid
const email2 = "user@subdomain.example.co.uk"; // Valid
const email3 = "invalid_email"; // Invalid
const email4 = "user@invalid"; // Invalid

console.log(emailRegex.test(email1)); // true
console.log(emailRegex.test(email2)); // true
console.log(emailRegex.test(email3)); // false
console.log(emailRegex.test(email4)); // false

Conclusion

Thank you for reading :) and I really hope you learned something new today. Please don't hesitate to reach out if you want me to add something and you can refer to this article as your personal note.

☕If you enjoy my content Buy me a coffee. It'll help me continue making quality blogs.

💙Follow me on Twitter(X) to know more about the topics I write and share.

Resources

https://www.regextutorial.org/regular-expression-metacharacters.php

https://learn.microsoft.com/en-us/dotnet/standard/base-types/quantifiers-in-regular-expressions