A Detailed Guide to the grep Command in Shell Scripting

2015-08-16 · Ryan · Post Comment

Introduction to grep

grep (global search regular expression and print out the line) is a powerful text search tool that uses regular expressions to search text and print matching lines. The grep family in Unix/Linux systems typically includes grep, egrep, and fgrep.

egrep is an extended version of grep, supporting more regex metacharacters. fgrep (fixed grep or fast grep) treats all letters as literal characters, meaning regex metacharacters lose their special meaning. Most Linux distributions use the GNU version of grep, which is more powerful and can switch modes via command-line options: -G (default, basic regex), -E (extended regex, equivalent to egrep), and -F (fixed strings, equivalent to fgrep).

grep searches one or more files for a given string pattern. If the pattern contains spaces, it must be quoted. All arguments after the pattern are treated as filenames. Results are printed to the screen without modifying the original files.

grep is useful in shell scripts because it returns an exit status indicating the search result:

0: Success, matches found.
1: No matches found.
2: File does not exist or is unreadable.

Scripts can use these return values for conditional logic and automated text processing.

Basic Regular Expression Metacharacters

These are the basic regex metacharacters supported by grep in default (-G) mode:

^: Anchors the start of a line. Example: '^grep' matches lines starting with grep.
$: Anchors the end of a line. Example: 'grep$' matches lines ending with grep.
.: Matches any single non-newline character. Example: 'gr.p' matches gr, any character, then p.
*: Matches the preceding character zero or more times. Example: ' *grep' matches grep preceded by zero or more spaces.
[]: Matches a single character from a specified set. Example: '[Gg]rep' matches Grep or grep.
[^]: Matches a single character NOT in the specified set. Example: '[^A-FH-Z]rep' matches rep preceded by a letter not in A-F or H-Z.
(..): Groups characters for backreferencing with 1, 2, etc. Example: '(love)' marks love as group 1.
<: Anchors the start of a word. Example: '<grep' matches lines containing a word starting with grep.
>: Anchors the end of a word. Example: 'grep>' matches lines containing a word ending with grep.
x{m}: Repeats character x exactly m times. Example: 'o{5}' matches lines with exactly 5 consecutive 'o's.
x{m,}: Repeats character x at least m times. Example: 'o{5,}' matches lines with at least 5 consecutive 'o's.
x{m,n}: Repeats character x between m and n times. Example: 'o{5,10}' matches 5 to 10 consecutive 'o's.
w: Matches a word character (alphanumeric or underscore). Equivalent to [A-Za-z0-9_].
W: Matches a non-word character (complement of w).
b: Matches a word boundary. Example: 'bgrepb' matches the standalone word grep.

Extended Regular Expression Metacharacters

When using egrep or grep -E, the following extended metacharacters are available, offering a more concise syntax:

+: Matches the preceding character one or more times. Example: '[a-z]+able' matches one or more lowercase letters followed by able.
?: Matches the preceding character zero or one time. Example: 'gr?p' matches grp or grep.
|: Alternation (OR). Matches one of several patterns. Example: 'grep|sed' matches grep or sed.
(): Grouping (no backslash needed). Example: 'love(able|rs)ov+' matches loveable or lovers followed by one or more ov.
x{m}, x{m,}, x{m,n}: Same functionality as in basic regex but without escaping braces.

POSIX Character Classes

For consistent behavior across different locales, POSIX defines character classes. In grep (except fgrep mode), they are used inside [[: :]]. For example, [[:alnum:]] is equivalent to [A-Za-z0-9].

[:alnum:]: Alphanumeric characters.
[:alpha:]: Alphabetic characters.
[:digit:]: Digits.
[:graph:]: Non-space, printable characters.
[:lower:]: Lowercase letters.
[:cntrl:]: Control characters.
[:print:]: Printable characters (including space).
[:punct:]: Punctuation characters.
[:space:]: Whitespace characters (space, tab, newline, etc.).
[:upper:]: Uppercase letters.
[:xdigit:]: Hexadecimal digits (0-9, a-f, A-F).

Common grep Command Options

-A NUM: Print NUM lines of context After the match.
-B NUM: Print NUM lines of context Before the match.
-C NUM: Print NUM lines of context Before and After the match.
-c, --count: Print only a count of matching lines.
-f FILE: Read patterns from FILE.
-h: Suppress filenames in multi-file output.
-i: Ignore case distinctions.
-l: Print only names of files containing matches.
-L: Print only names of files with no matches.
-n: Prefix each matching line with its line number.
-q: Quiet mode; suppress output, use exit status only.
-v: Invert match; select non-matching lines.
-w: Match only whole words.
-E: Use extended regex (like egrep).
-F: Treat pattern as fixed strings (like fgrep).

Examples

Mastering grep involves practice with regular expressions. Here are some common examples:

$ ls -l | grep '^a'

Filter ls -l output to show only lines starting with 'a'.

$ grep 'test' d*

Show lines containing 'test' in all files starting with 'd'.

$ grep '[a-z]{5}' aa

Show lines in file 'aa' containing strings of at least 5 consecutive lowercase letters.

$ grep 'w(es)t.*1' aa

If 'west' matches, 'es' is stored as group 1 (1). The pattern then matches any characters (.*) followed by another 'es'. With grep -E, write it as 'w(es)t.*1'.