Understanding Regular Expressions

  • Updated

Regular expressions (regex) are expressions used for pattern matching and text validation. They allow users to define patterns that specify acceptable string formats, simplifying data validation by avoiding the need to list each acceptable value individually.

To compose a regular expression, it is important to understand the following statements:

1. Define the pattern to match.

  • Understand the data: Before writing the regex, users must know the exact format of the expected text by the field that the rule will be validating or matching
    Example: email, phone number, date.

  • Identify the rules: Determine the structure of the pattern. Is it an email, phone number, date, or username? Each of these has its own unique structure, so understanding the expected format is key.
    Example:
    • Email: username@domain.com
    • Phone number: (123) 456-7890 or 123-456-7890
    • Date: MM/DD/YYYY or YYYY-MM-DD

2. Use Character Classes, Quantifiers, Anchors, Special Characters and Escaping to compose the regex pattern.

3. Put it all together combining the elements to create a regex pattern that matches your desired text.

Character Classes

There are different characters used in the regular expression syntax:

  • [abc] Matches any one of the characters a, b, or c.
  • \d Matches any digit (0-9).
  • \w Matches any word character (letters, digits, and underscore).
  • \s Matches any whitespace character (spaces, tabs, newlines).
  • [^abc] Matches any character except a, b, or c (negated character class).
  • [a-z] Matches any character between a & z.

Quantifiers

  • * Matches 0 or more occurrences of the preceding element (e.g., \d* matches zero or more digits).
  • + Matches 1 or more occurrences of the preceding element (e.g., [a-z]+ matches one or more lowercase letters).
  • ? Matches 0 or 1 occurrence of the preceding element (e.g., \d? matches an optional digit).
  • {n,m} Matches between n and m occurrences of the preceding element (e.g., \d{2,4} matches 2 to 4 digits).

Anchors

  • ^ Asserts the start of the string.
  • $ Asserts the end of the string.

Special Characters and Escaping

  • . Matches any character except a newline.
  • [] Denotes a character class (e.g., [a-z] matches any lowercase letter).
  • () Groups parts of the expression together (e.g., (abc) matches "abc").
  • | Acts as an OR operator (e.g., a|b matches either a or b).

Example Regular Expressions

Email

The regular expression for an email with the structure username@domain.com is: 

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

This regex matches an email with a valid username, domain, and extension.

  • Valid Input: If the input is userName1.2%3@Gmail.com it matches the regular expression because the username contains valid characters (a-zA-Z0-9._%+-), the domain is valid (Gmail), and the domain extension is valid (.com).
  • Invalid Input: If the input is username%$?@/gmail… it does not match the regular expression because the username contains invalid characters (%, $, ?), and the domain contains an invalid character (/). The regex only allows letters, numbers, periods, hyphens, and specific symbols in the username and domain.
Element in the Regex Pattern Description
^

Denotes the start of the string.

[a-zA-Z0-9._%+-]

Matches the username part of the email.

This part allows “a” to “z” letters (both uppercase and lowercase), digits, and a few special characters like ._%+-.

+ Means that the username part must have one or more of these characters.
@ This is the literal "@" symbol that separates the username from the domain.
[a-zA-Z0-9.-]

Matches the domain name part.

This part allows “a” to “z” letters (both uppercase and lowercase),  periods (.), and hyphens (-).

+ The domain part must have one or more of these characters.
\. This is the literal period (.) that separates the domain name from the domain extension.
[a-zA-Z]

Matches the domain extension (like .com, .org)

This part allows “a” to “z” letters (both uppercase and lowercase)

{2,} The domain extension must have at least 2 letters (E.g.: .uk, .com, .org).
$ Denotes the end of the string.

 

Phone Number

The regular expression for a phone number in the format (123) 456-7890 is:
\(\d{3}\) \d{3}-\d{4}

  • Valid Input: If the input is (123) 456-7890 it matches the regular expression because:
    • The area code is enclosed in parentheses and contains exactly 3 digits.
    • A space follows the closing parenthesis, separating the area code from the main number.
    • The main number contains 3 digits, followed by a hyphen, and then 4 digits, which is the correct format for a phone number.
  • Invalid Input: If the input is 123-456-7890 it does not match the regular expression because:
    • The area code is not enclosed in parentheses, which is required by the regex.
    • The input lacks the space after the area code and the hyphen between the parts of the main number, both of which are required for a valid match.
Element in the Regex Pattern Description
\( Matches the literal opening parenthesis (escaped with a backslash because parentheses are special characters in regex).
\d{3} Matches exactly 3 digits (\d represents any digit, and {3} specifies exactly three digits).
\) Matches the literal closing parenthesis ) (escaped with a backslash).
(space) Matches the literal space character between the area code and the main phone number.
\d{3} Matches exactly 3 digits (the first part of the main number).
- Matches the literal hyphen (-) separating the two parts of the main number.
\d{4} Matches exactly 4 digits (the second part of the main number).

 

Additional Support

For more information on Regular expressions, please visit:

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.