grep and regular expression

grep command is used to list all files that contain the searching pattern.

Syntax

grep -[acinv] ‘pattern’ filename

Common Options

-c, –count Output number of matched line
-i, –ignore-case Ignore case
-n, –line-number Prefix each line of output with the 1-based line number within its input file
-q, –quiet Do not write anything to standard output. Exit immediately with zero status if any match is found
-s, –no-messages Suppress error messages about nonexistent or unreadable files.
-h, –no-filename Suppress the filename for each match

-o, –only-matching Print only the matched part of a matchging line
-l, –files-with-matches Print only the matched filename
-L, –files-without-match Print only the non-matched filename
-v, –invert-match Output non-mathed lines
-r, –recursive Read all files under each directory, recursively.
–color[=WHEN] Display the matched string in color. WHEN is never, always, or auto.

Examples:

$ grep -c pattern filename                   Count how many lines contain the pattern
$ grep -n pattern filename                   Display the matched line prefixed with line number, same as nl files | grep 'pattern'
$ grep -v pattern filename                   List lines that do not contain the pattern
$ grep -i pattern filename                   Search all lines for pattern, ignoring cases
$ grep -l pattern filename                   List only filenames that have matched pattern. e.g grep -l 'main' *.c will list file name whose contents main() function
$ grep -L pattern filename                   List filenames that do not have matched pattern
$ grep -w pattern filename                   Match the whole word instead of only a part of string
$ grep -C number pattern filename            Display above and below `number` of lines for the matched line
$ grep pattern1 | pattern2 filename          Display lines that have match for pattern1 or pattern2
$ grep pattern1 filename | grep pattern2     Display lines that have match for pattern1 and pattern2
$ grep 'pattern' d*                          Search for pattern in all files whose name beging with d 
$ grep 'pattern' file1 file2 file3           Search for pattern in file1, file2 and file3

More options

-d Action, –directories=Action, Here Action can have 3 values: read, skip, recurse

read default action
skip directories are skipped
recurse reads all files under each directory recursively

–exclude=Glob Skip files whose base name matches GLOB (using wildcard matching).
–exclude-dir=DIR Exclude directories matching the pattern DIR from recursive searches.
–include=GLOB Search only files whose base name matches GLOB

How to search a pattern in a directory recursively?

Since grep use pattern matching to search files, if the directory we are searching does not contain any sub directories, we can simply use * to match all files in current directories:

$ grep 'pattern' *

However, if the directory contains any sub directory, grep will fail and prompt error messages like ‘grep: bin: Is a directory’, We can use -r options or -d recurse to search recursively in sub directories

$ grep -r 'pattern' * 
$ grep -d recurse 'pattern' *

or we can use find command to cooperate with grep.

$ find . -type f | xargs grep -i 'pattern'

Regular expressions

Although -E option or egrep command provides Regular expression extensions for grep, it is recommaned to use grep -P to process regular expression matching as it
provides even more features.

for example, \d is supported in grep -P while not supported in grep -E. to use digit pattern with -E we have to use [0-9] or [:digit:].

Below is a list of posix character classes:

[:alnum:]
[:alpha:]
[:cntrl:]
[:digit:]
[:graph:]
[:lower:]
[:print:]
[:punct:]
[:space:]
[:upper:]
[:xdigit:]

Example:

$ echo -e "One\n123\n\t" | sed -n '/[[:alpha:]]/ p'

Standard Regular Expression Symbols


^           Match beginning of input
$           Match end of input
[]          Match a set of characters
[^]         Match characters that are not in this set
[-]         Character range
.           Match any single character except line terminators: \n, \r
*           Match zero or more occurrence of the preceding character
?           Match zero or one occurrence of the preceding character
+           Match one or more of the preceding character
{n}         Match exact n occurrences of the preceding character
{n,}        Match at least "n" occurrences of the preceding character
{m, n}      Match at least "m" and at most "n" occurrences of the preceding character.
|           The Pipe character represents logical `OR` operation
\(..\)      Group the mathcing pattern
\<          Anchor the beging of a word
\>          Anchor the end of a word

Meta characters

\b          Word Boundary. e.g. "\the\b" matches only the word "the", but not "there", "wether", etc. 
\B          Non-word Boundary. e.g. "\the\B" matches "these", "their", etc. but not "the"
\s          Single Whitespace
\S          Single Non-Whitespace
\w          Single Word Character, i.e. alphabetical characters, digits, and underscore _
\W          Single Non-Word Character.

Example:

`$ echo -e "One\n123\n1_2\n&;#" | sed -n '/\W/ p'`
&;#

Extensions:

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning, instead they have to be used after escaped ?, +, {, |, (, and ). However, in extension definition, they can be used naturally without escaping.

Perl Extensions

grep -P

1 2	\d [0-9] digit \D [^0-9] nondigit

grep -F Literal matching
Interpret PATTERN as a list of fixed strings (rather than regular expressions), separated by newlines, any of which is to be matched.

This is used to search a raw string with some characters (e.g. *, +, etc.) that have special meaning in a regular expression when use grep. Characters like * or ? will not be interpreted as the meta symbols in fgrep or grep -F.

e.g.

$ grep -F '*' /etc/profile

output:
for i in /etc/profile.d/*.sh; do