Regular Expressions: Difference between revisions
From Jedisaber Wiki
No edit summary |
|||
| Line 50: | Line 50: | ||
'''Examples:''' | '''Examples:''' | ||
Pattern Matches<br /> | Pattern Matches<br /> | ||
<code> ^A </code> | <code> ^A </code> "A" at the beginning of a line<br /> | ||
<code> A$ </code> | <code> A$ </code> "A" at the end of a line<br /> | ||
<code> A^ </code> | <code> A^ </code> "A^" anywhere on a line<br /> | ||
<code> $A </code> | <code> $A </code> "$A" anywhere on a line<br /> | ||
<code> ^^ </code> | <code> ^^ </code> "^" at the beginning of a line<br /> | ||
<code> $$ </code> | <code> $$ </code> "$" at the end of a line<br /> | ||
== References == | == References == | ||
Revision as of 22:31, 10 January 2017
A regular expression (or regex for short) is a standard way of using text to form a search to match patterns.
Similar to using an asterisk like this: *.jpg in a search box to find all JPEG files, you can use a regular expression (along with something like grep) to match much more complex patterns.
For example, you could use:
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b
to search for any e-mail addresses in a file
Cheat Sheet
a - Literal character, like the letter, "a". Every character is literal except these twelve: \ ^ $ . | ? * + ( ) [ {
. (dot) - a single character.
? - the preceding character matches 0 or 1 times only.
* - the preceding character matches 0 or more times.
+ - the preceding character matches 1 or more times.
{n} - the preceding character matches exactly n times.
{n,m} - the preceding character matches at least n times and not more than m times. Example: a{2,4} match the character at least twice, but not more than four times.
[agd] - the character is one of those included within the square brackets.
[^agd] - the character is not one of those included within the square brackets.
[c-f] - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f. You can use numbers to specify a range of numbers as well.
() - allows us to group several characters to behave as one.
| (pipe symbol) - the logical OR operation.
^ - matches the beginning of the line.
$ - matches the end of the line.
\ - escapes a special character. For example, if you want to see if a file has a question mark in it, you can't use the question mark symbol because it has a special meaning. So, we escape (tell regex to ignore it's special meaning and treat it as a literal character) it by putting a backslash in front of it. Like this: \?
Anchor Characters
Regular expressions examine the text between separators. If you want to search for a pattern that is at one end or the other, you use anchors. The character ^ is the starting anchor, and the character $ is the end anchor.
Note that ^ and $ are only anchors if the are used at the start (^) or end ($) of a pattern.
Examples:
Pattern Matches
^A "A" at the beginning of a line
A$ "A" at the end of a line
A^ "A^" anywhere on a line
$A "$A" anywhere on a line
^^ "^" at the beginning of a line
$$ "$" at the end of a line
