What is Regular Expression and how to write it?

Definition of Regular Expression

Regular Expression (also called RegEx or Rational Expression) is a sequence of characters that describes a search pattern used to match character combinations in strings.

Regular Expression in simple words

Suppose you have a string or a sentence, and you want to match or check whether a specific text is present in the sentence or not. And you can easily match by creating a Regular Expression pattern.

So it’s not a programming language, it’s a pattern-matching thing that you can use in any programming language.

Why use patterns to match a specific text?

With the help of the RegEx pattern, you can match multiple texts, and you can do complex matches easily.

How to write Regular Expression Patterns?

Usually, we write the RegEx pattern between two forward slashes /RegEx-Pattern/.

/Hello/Hello WorldYes
/Hi/Hello WorldNot

Flags of RegEx

  • RegEx has some flags that affect the search.
  • Flags are single letters like (g, i, m, s, u, etc.). And each flag put a different effect on the search pattern.
  • You specify a flag after the last forward slash, and you can also add multiple flags at once.

Example of Flags

Now we will see examples of g and i, these are the most used flags.

g – With this flag, the search looks for all matches, without it – only the first match is returned.

regex global flag example

i – By default, regex matches are case sensitive, which means Capital A is not equal to the small a. The i flag is used to enable the case-insensitivity.

regex case insensitive example

Metacharacters in RegEX

We will now look at three such metacharacters that are used to match a specific type of character:

  • \d – It matches any single digit from 0 to 9.
  • \w – any single character from a-z, A-Z, and 0-9.
  • \s – any single whitespace (space).
regex d metacharacter example
regex w metacharacter example
regex s metacharacter example

Capital \D and \W

  • \D – Matches any character that is not a digit character (0-9).
  • \W – Matches any character that is not a word character (alphanumeric & underscore).

The start and end of the string

If you want to check that a string starts with a specific character/characters or ends with a specific character/characters, then there are two symbols you can use:

  • ^ – Matches the beginning of the string.
  • $ – Matches the end of the string.

^ Symbol example

/Hello/gHello Hello HelloYes all.
/^Hello/gHi Hello Hi HelloNo. (Hello must be in the beginning).
/^Hello/gHello Hello HelloYes.
/^\wet/gget set goYes.

$ Symbol example

Add the $ symbol after that last character.

/you$/gHow are you?Not (Because there is a ? mark at the end of the string).
/you$/gHow are youYes.

Match any repeating character

The following two symbols are used to match the repeating characters.

  • * – The character is being repeated zero or more times.
  • + – The character is being repeated One or more times.

How to use the above symbols? – Add the symbol after the targeted character.

/ab+c/acNo (Because b must be there at least once or more).

Match any single character in RegEx

The dot (.) is used to match any single character (It could be any letter, number, space, etc.). This dot sign is also known as the wildcard symbol in RegEx.

/a.c/a cYes

Escape the metacharacters in RegEx

Metacharacters has own usage in regex, but if you want to treat a metacharacter as a normal character, you have to use the escaping.

Use the backslash \ to escape the metacharacters like . + * $ etc.


Make a character optional

The ? mark is used to make a character optional in RegEx. You can also use the * to achieve this thing.

/ab?c/acYes (Because the b is optional).

Specify the repetition length of a character

Use the curly braces {} to specify how many times a character must be repeated.

{min, max} = this operator takes two things min value and max value, and max value is optional.

/xy{2}z/xyzNo (y must be repeated 2 times)
/xy{1,5}z/xyyyzYes (y must be repeated at least 1 time, but not more than 5)
/xy{3,}z/xyyyyyyyzYes (y must be repeated at least 3 times, the max value is not fixed)

Match a set of characters

If you want to match a character from a character set, use the square brackets [...] where you can specify a set of characters.

/x[123]+y/x21232231yYes (you can combine it with the repetition character).

Negated character set in RegEx

The caret ^ inside of a character set [^ ] is the negation operator. This will match any character except for the characters specified inside the square brackets.


Match Character ranges

[ - ] You can specify a range of characters inside a character set.

  • 1-5 = 1 to 5
  • a-z = a to z
/x[a-g]+y/xdeBfyNo (because, Range is between small a to small g)
/x[a-dF-P6-9]+y/xcJ6Nb7yYes (you can specify multiple ranges at once).
/x[^g-p5-7]+y/xa2yYes (using the negation operator with the ranges).

Capture groups in RegEx

Capturing groups is a way to treat multiple characters as a single unit. Use the parentheses () for capturing groups.

regex capture group

(?:) – Match expression but do not capture into a group.

regex do not capture group

(?<name_the_group>) – Give the group name as you wish.

regex give the name group

Look behind and Look Ahead (collectively called look around)

  • (?<=)search_character Positive look behind.
  • search_character(?=) Positive look ahead.
(?<=@).hi.john@gmail.comMatch any single character behind the @
(?<=@).*hi.john@gmail.comMatch all the characters behind the @
.(?=@)hi.john@gmail.comMatch any single character ahead of the @
.*(?=@)hi.john@gmail.comMatch all the characters ahead of the @
  • (?<!) Negative look behind.
  • (?!) Negative look ahead.

Example of Negative look behind

(?<![a-z])123hello123Not matched, because any character between
a to z should not be before the 123.
(?<![a-z])123hello 123It will match because there is a space before the 123.
(?<![a-z])123hello5123again It will match because there is a digit before the 123.
Negative look behind.

Example of Negative look ahead

hello(?![1-5])hello123Not matched, because any character between
1 to 5 should not be after the hello.
hello(?![1-5])hello 123It will match because there is a space after the hello.
hello(?![1-5])hello712again It will match because
there is the 7 before the hello.
Negative look ahead.