Regex cheat sheet

regex reserved characters = must be escaped to be taken literally

^.[$()|*+?{\

escape character

\

how to represent a backslash?

A literal \ is created using \\

NOTE: In php regex, you need three or four backslashes to represent a single literal backslash.

To match Tests\ComProtocolTest, you need the following regex /Tests\\\\ComProtocolTest/.

quantifiers

?          zero or one
*          zero or more
+          one or more
{2}        two
{2,}       two or more
{2,5}      two to five

examples

abc?       ab followed by zero or one c
abc+       ab followed by one or more c
(phone){2} exactly 2 phone
a(bc){2,5} a followed by 2 up to 5 copies of the sequence bc

anchors

^          beginning of string
$          end of string

examples

roar       matches any string that has the text roar in it
^The       matches any string that starts with The
end$       matches a string that ends with end
^The end$  exact string match (starts and ends with The end)

OR operator

a[bcd]       a followed by either b,c or d
a(?:one|two) a followed by one or two without capture
a(one|two)   a followed by one or two with capture

range

[a-z]        any lowercase character

characters

.          match any character except for line terminators
\S         match non-whitespace character
\s         match whitespace character (includes tabs and newline). \s is short for [ \t\r\n\f].
\d         match digit character. short for [0-9].
\w         match word character. short for [a-zA-Z0-9_].
\D         match non-digit character
\W         match non-word character
\A         match beginning of the whole string (as opposed to ^ matching the beginning of one line)
\Z         match end of the whole string (as opposed to $ matching the end of one line)

\p{L}      any letter character from any language
\p{M}      marks (a character that is to be combined with another: accent, etc.)

match shortest string .*?

(.*?)

flags

g  global does not return after the first match, restarting the subsequent searches from the end of the previous match
m  multi-line when enabled ^ and $ will match the start and end of a line, instead of the whole string
i  case-insensitive

/roar/i

grouping and capturing

everything between () is captured and can be reused with $
to avoid capturing use (?:notCaptured)

preg_replace('/(078)(652)(27)(36)/', '$4 $3 $2 $1', '0786522736');

look-ahead and look-behind

d(?=r)       matches a d only if is followed by r, but r will not be part of the overall regex match
(?<=r)d      matches a d only if is preceeded by an r, but r will not be part of the overall regex match

you can use also the negation operator

d(?!r)       matches a d only if is not followed by r, but r will not be part of the overall regex match
(?<!r)d      matches a d only if is not preceeded by an r, but r will not be part of the overall regex match

negation

The ^ (circumflex or caret) inside square brackets negates the expression
So to find a "foo" not preceeded by a "." would be:

[^.]foo

negate whole expression

(?!pattern)

take only lines not starting with url

^(?!url).*$

lines not matching x

^((?!drivers).)*$

unicode

https://www.regular-expressions.info/unicode.html

The PHP preg functions, which are based on PCRE, support Unicode when the /u option is appended to the regular expression.

reference

https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285