Awk Programming
字数 788 阅读 12 评论 6
Awk is very powerful in processing delimiter-separated text in Linux.
Typical Uses of AWK
Awk can be used in many different ways, such as:
- Text processing,
- Producing formatted text reports,
- Performing arithmetic operations,
- Performing string operations, and many more.
Program Structure
Syntax
awk [option] ‘BEGIN{} { … body … } END{}’
Awk’s default action is print
BEGIN block
The syntax of the BEGIN block is as follows:
BEGIN {commands}
The BEGIN block gets executed at program start-up. It executes only once.
This is good place to initialize variables. BEGIN is an AWK keyword and hence it must be in upper-case. This block is optional.
Body Block
The syntax of the body block is as follows:
/pattern/ {commands}
The body block applies AWK commands on every input line. By default, AWK executes commands on every line.
We can restrict this by providing patterns. Note that there are no keywords for the Body block.
END Block
The syntax of the END block is as follows
END {commands}
Work flow
Awk will execute command in BEGIN block first, then fetch a line from input stream, execute commands in Body block according to some pattern matching result,
once done processing a line, it will fetch the next line, until there is no line from input stream, finally it will execute commands in END block.
Awk use ''
to enclose all commands
Awk use $0
to refer to the current line that have been read, user $1, $2, … to refer to each field in a line.
Example
(1) Print all lines
$ awk '{print $0}' file
(2) Print matching line
$ awk '/pattern/' file
or without ''
$ awk /pattern/ tt
(3) Count the number of matching lines
$ awk '/pattern/{++cnt} END{print "Count=", cnt}' file
(4) Print the lines that have length greater than 18
$ awk 'length($0)>18' file
(5) Access shell variable in awk
By default, awk can only access itself variables, we can use -v to import shell variables to awk
$ myvar="something"
$ awk -v var="$myvar" '{print var}' file
Internal variables
Awk provides a couple of internal variables, if we execute awk -d
, it will generate a file named awkvars.out
in current directory,
this file contains all the internal variables of awk.
These internal variables are separated into two categories:
1. Control awk
FS Input field separator
OFS Output stream field separator
RS Input record separator
ORS Output record separator
2. Convey Information
ARGC Number of input args
ARGV Array that stores all input args
ENVIRON
FILENAME input filename
FNR the current record number in the current file,it is incremented each time a new record is read
NF the number of fields in the current input record
NR the number of input records awk has processed
(1) ENVIRON
$ awk 'BEGIN {print ENVIRON["USER"]}"
(2) FILENAME
$ awk 'END {print FILENAME}' marks.txt
(3) FS
Field Separator of the input stream, by default is whitespace, can be modified using -F
option.
$ awk 'BEGIN {print "FS = " FS}' | cat -vte
(4) NF
Number of fields in a line
e.g. Print lines that have more than 3 fields
$ awk 'NF > 2' One Two Three <<< "One Two\nOne Two Three\nOne Two Three Four"
(5) NR
Line number of current line
$ awk 'NR < 3' <<< "One Two\nOne Two Three\nOne Two Three Four"
(6) FNR
Line number of current line of current file
This is useful when awk is process multiple files
(7) OFS
Field separator of output stream, by default is whitespace.
(8) RS
Record separator of input stream, by default is newline character '\n'
.
(9) ORS
Record separator of output stream, by default is newline character \n
.
(9) RLENGTH
The length of matched string
$ awk 'BEGIN { if (match("One Two Three", "re")) { print RLENGTH } }'
2
(11) RSTART
The first position matched string appears
$ awk 'BEGIN { if (match("One Two Three", "Thre")) { print RSTART } }
(12) SUBSEP
The separator character for array subscripts, the default value is \034
$ awk 'BEGIN { print "SUBSEP = " SUBSEP }' | cat -vte
Regular Expression Operator
'~'
denotes matched'!~'
denotes not matched
$ awk '$0 ~ 9' marks.txt
2) Rahul Maths 90
5) Hari History 89
$ awk '$0 !~ 9' marks.txt
1) Amit Physics 80
3) Shyam Biology 87
4) Kedar English 85
Note: we need to use backslash to escape regular expression characters if we use awk, otherwise we should use gawk.
$ tail -n 40 /var/log/nginx/access.log | awk '$0 ~ /ip\[127\.0\.0\.1\]/'
Reference
https://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_11.html