>

This article introduces the usages of Linux commands cut, tr, cat, etc.

cut

cut is used to divide a file into several parts (columns)

cut writes selected parts of each line of each input stream (from file or stdin) to standard output

Syntax

cut [OPTION]… [FILE]…

Common OPTIONS

  • -f FIELD-LIST –fields=FIELD-LIST
    Print only the fields listed in FIELD-LIST. Fields are separated by a TAB character by default.
  • -d INPUT_DELIM_BYTE –delimiter=INPUT_DELIM_BYTE
    For ‘-f’, fields are separated in the input by the first character in INPUT_DELIM_BYTE (default is TAB).
  • -s –only-delimited
    For ‘-f’, do not print lines that do not contain the field separator character.
  • –output-delimiter=OUTPUT_DELIM_STRING
    For ‘-f’, output fields are separated by OUTPUT_DELIM_STRING The default is to use the input delimiter.

Example

$ cat me.txt
1;2;3;4;5;6;7;8;9

(1) Print field 2

$ cat sample.txt | cut -d \; -f 2
2

(2) Specify output format

$ cat ct | cut -d \; -f 1-5 --output-delimiter="-"

Note: ; needs to be escaped as it is a command separator character in command line

cat

Cat command concatenate files or stdin, to standard output. When file is not specified, or specified as -, cut will read standard input.

Syntax

cat [Options] [File]…

Common Options

  • -v, –show-nonprinting use ^ and M- notation, except for LFD and TAB. display this help and exit
  • -T, –show-tabs display TAB characters as ^I
  • -E, –show-ends display $ at end of each line
  • -n, –number number all output lines
  • -s, –squeeze-blank squeeze multiple blank line as one

tr

tr (translate) is used to rranslate, squeeze, and/or delete characters from standard input, writing to standard output

tr can be used to accomplish basic functionalities of sed, it translate or delete characters in SET 2 according to SET 1. We can use tr command as a mini sed to fullfill some work.

tr command is act as a filter, it reads characters from the standard input, and writes results to the standard output, characters in SET2 be translated if if appears in SET1

Syntax

tr [options] set1 set2

Common Options

  • -c, -C, –complement use the complement of SET1, usually cooperat with -d, -s
  • -d, –delete delete characters in SET1, do not translate
  • -s, –squeeze-repeats replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character
  • -t, –truncate-set1 truncate SET1 to length of SET2

Examples

(1) Squeeze all consecutively repeated characters in me.txt

$ tr -s "[a-z]" < me.txt

(2) Remove all blank line

$ tr -s "[\012]" < me.txt

or

$ tr -s ["\n"] < me.txt

(3) Remove all ^M with \n

$ tr -s "[\015]" "[\n]" < me.txt

or

$ tr -s "[\r]" "[\n]" < me.txt

(4) Uppercase Lowercase convertion

$ cat a.txt |tr "[a-z]" "[A-Z]" > b.txt

(5) Delete specified characters

e.g. In a schedule containing date (Uppercases and lowercases included) and digits, we want to keep the date but remove all digits

$ tr -cs "[a-z][A-Z]" "[\012*]" < diary.txt

PS: -s squeezes all blank lines, -c keep all alphabetical character unchanges.

(6) Convert control characters

tr is often used to convert control character between dos and Unix, use cat -v filename to display control characters.

$ cat -v stat.txt
box aa^^^^^12^M
apple bbas^^^^23^M
^Z

Note: we can use ctrl + v, ctrl + M to type ^M in unix command line.

(7) Translate all colon : to TAB to emprove readability

$ tr -s "[:]" "[\011]" < /etc/passwd 或 tr -s "[:]" "[\t]" < /etc/passwd

(8) Separate $PATH to lines to display path more clearly

$ echo $PATH | tr ":" "\n" 

(9) We can use ! to execute tr in vim like this:

1,$!tr -d '\t'

Note

tr can not use two characters to replace one character, we should use awk or sed in that case.

$ awk '{ print $0"\r" }'<unixfile > dosfile

sort

Syntax

sort [OPTION]… [FILE]…
sort [OPTION]… –files0-from=F

Common Options

-b, –ignore-leading-blanks Ignore leading blanks.
-d, –dictionary-order Consider only blanks and alphanumeric characters.
-f, –ignore-case Fold lower case to upper case characters.
-g, –general-numeric-sort Compare according to general numerical value.
-i, –ignore-nonprinting Consider only printable characters.
-M, –month-sort Compare (unknown) < JAN' < ... <DEC’.
-h, –human-numeric-sort Compare human readable numbers (e.g., “2K”, “1G”).
-n, –numeric-sort Compare according to string numerical value.
-R, –random-sort Sort by random hash of keys.
–random-source=FILE Get random bytes from FILE.
-r, –reverse Reverse the result of comparisons.
–sort=WORD Sort according to WORD: general-numeric -g, human-numeric -h, month -M, numeric -n, random -R, version -V.
-V, –version-sort Natural sort of (version) numbers within text.

Other Options

–batch-size=NMERGE Merge at most NMERGE inputs at once; for more use temp files.
-c, –check, –check=diagnose-first Check for sorted input; do not sort.
-C, –check=quiet, –check=silent Like -c, but do not report first bad line.
–compress-program=PROG Compress temporaries with PROG; decompress them with PROG -d.
–debug Annotate the part of the line used to sort, and warn about questionable usage to stderr.
–files0-from=F Read input from the files specified by NUL-terminated names in file F; If F is - then read names from standard input.
-k, –key=POS1[,POS2] Start a key at POS1 (origin 1), end it at POS2 (default end of line). See POS syntax below.
-m, –merge Merge already sorted files; do not sort.
-o, –output=FILE Write result to FILE instead of standard output.
-s, –stable Stabilize sort by disabling last-resort comparison.
-t, –field-separator=SEP Use SEP instead of non-blank to blank transition.
-T, –temporary-directory=DIR Use DIR for temporaries, not $TMPDIR or /tmp; multiple options specify multiple directories.
–parallel=N Change the number of sorts run concurrently to N.
-u, –unique With -c, check for strict ordering; without -c, output only the first of an equal run.
-z, –zero-terminated End lines with 0 byte, not newline.
–help Display a help message, and exit.
–version Display version information, and exit.

Examples

1. Sort In Reverse Order

$ sort -r data.txt

2. Check whether a file is in order

$ sort -c data.txt

3. Sort in field

fields in sort uses 1-based index, and are separated by whitespace by default, use -t option to specify other delimiters.

Syntax:

-k POS1 or -k POS1,POS2

$ sort -k 2 data.txt

4. Sort in field and specific position of character in this field

Syntax:

-k POS.i POS is the field position number, and i is the ith character in this field

5. Sort and merge multiple file

use find to produces a NUL-terminated file list as its output, and to pipe that output into sort using the --files0-from option.

$ find -name "data?.txt" -print0 | sort --files0-from=-

Explanation: By defualt, find outputs one file on each line; in other words, it inserts a line break after each filename it outputs.
It would be nice if we could use this output to tell the sort command, “sort the data in any files found by find as if they were all one big file.” The problem with the standard find output is, even though it’s easy for humans to read, it can cause problems for other programs that need to read it in. This is because filenames can include non-standard characters, so in some cases, this format will be read incorrectly by another program.
The correct way to format find’s output to be used as a file list for another program is to use the -print0 option when running find. This terminates each filename with the NUL character (ASCII character number zero), which is universally illegal to use in filenames. This makes things easier for the program reading the file list, since it knows that any time it sees the NUL character, it can be sure it’s at the end of a filename.

Once we execute find -name "data?.txt" -print0, it will produce the following output: ./data1.txt./data3.txt./data2.txt, although we can’t see NUL character as it is non-printable character, it is actually appended by -print0.

Now we should specify the --files0-from option in the sort command, and specify the file as a dash (“-“), so that it will read from the standard input.

The final effect is that sort command will sort the data of all files located by find command, as if they were all one file and output, and output the merge sorted data to
standard output.

wc

Syntax

wc [option] filename

-l print line counts
-w print word counts
-m print char counts

rev

rev reverses lines characterwise

Syntax

rev [option] [file…]

paste

The paste command displays the corresponding lines of multiple files side-by-side.

paste writes lines consisting of the sequentially corresponding lines from each FILE, separated by tabs, to the standard output.
With no FILE, or when FILE is a dash (“-“), paste reads from standard input.

Syntax

paste [OPTION]… [FILE]…

Common Options

-d, –delimiters=LIST reuse characters from LIST instead of tabs.
-s, –serial paste one file at a time instead of in parallel.

Example

Display the contents of file1.txt and file2.txt, side-by-side, with the corresponding lines of each file separated by a tab.

$ paste file1.txt file2.txt

split

split and join commands are very helpful when manipulating large files.

split outputs fixed-size pieces of input INPUT to files named PREFIXaa, PREFIXab, …
The default size for each split file is 1000 lines, and default PREFIX is “x”. With no INPUT, or when INPUT is a dash (“-“), read from standard input.

Syntax

join [OPTION]… FILE1 FILE2

Common Options

-a N, –suffix-length=N Use suffixes of length N (default 2)
-b SIZE, –bytes=SIZE Write SIZE bytes per output file.
-C SIZE, –line-bytes=SIZE Write at most SIZE bytes of lines per output file.
-d, –numeric-suffixes Use numeric suffixes instead of alphabetic.
-e, –elide-empty-files Do not generate empty output files with “-n”
–filter=COMMAND Write to shell command COMMAND; file name is $FILE
-l NUMBER, –lines=NUMBER Put NUMBER lines per output file.
-n CHUNKS, –number=CHUNKS Generate CHUNKS output files. (See below.)
-u, –unbuffered Immediately copy input to output with “-n r/…”.
–verbose Print a verbose diagnostic just before each output file is opened.
–help Display a help message and exit.
–version Output version information and exit.
SIZE may be one of the following, or an integer optionally followed by one of following multipliers:

suffix - multiplier
KB 1000
K 1024
MB 1000 x 1000
M 1024 x 1024

… so on for G (gigabytes), T (terabytes), P (petabytes), E (exabytes), Z (zettabytes), Y (yottabytes).

CHUNKS may be:

  • N: split into N files based on size of input
  • K/N: output Kth of N to standard output
  • l/N: split into N files without splitting lines
  • l/K/N: output Kth of N to standard output without splitting lines
  • r/N: like “l” but use round robin distribution r/K/N likewise but only output Kth of N to standard output

Examples

(1) Basic Split

$ split split.zip
$ ls
xab  xad  xaf  xah  xaj  xal  xan  xap  xar  xat  xav  xax  xaz  xbb  xbd  xbf  xbh  xbj  xbl  xbn
xaa  xac  xae  xag  xai  xak  xam  xao  xaq  xas  xau  xaw  xay  xba  xbc  xbe  xbg  xbi  xbk  xbm

(2) Customize Split File Size using -b option

$ split -b200000 split.zip

(3) Customize output filenames with Numeric Suffix using -d option

$ split -d split.zip
$ ls
x01  x03  x05  x07  x09  x11  x13  x15  x17  x19  x21  x23  x25  x27  x29  x31  x33  x35  x37  x39
x00  x02  x04  x06  x08  x10  x12  x14  x16  x18  x20  x22  x24  x26  x28  x30  x32  x34  x36  x38

(4) Split the file newfile.txt into files beginning with the name new, each containing 300 lines of text.

$ split -l 300 file.txt new

(5) Customize the Number of Split Chunks using -C option (e.g. create 50 chunks of split files)

$ split -n50 split.zip

(6) Avoid Zero Sized Chunks using -e option

$ split -n50 -e testfile

(7) Split file based on Number of Lines using -l option (e.g. split a file every 200 lines)

$ split -l200 split.zip

join

Joins the lines of two files which share a common field of data.

Common Options

split [OPTION]… [INPUT [PREFIX]]

Examples

(1) suppose we have file1 and file2 containing contents like this:

$ cat file1
1 India
2 US
3 Ireland
4 UK
5 Canada

$ cat file2
1 NewDelhi
2 Washington
3 Dublin
4 London
5 Toronto

After join, we can get result like this:

$ join file1 file2
1 India NewDelhi
2 US Washington
3 Ireland Dublin
4 UK London
5 Canada Toronto

Note that the matching field should be in exact order, the non-ordered matching field will not be joined.

(2) Print Only Unpaired Lines using -v option

$ join -v1 file1 file2
f Australia

(3) Join Based on Different Columns from Both Files using -1 and -2 option

By default the first columns in both the files is used for comparing before joining. You can change this behavior using -1 and -2 option.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada

$ cat testfile2
NewDelhi a
Washington b
Dublin c
London d
Toronto e

$ join -1 1 -2 2 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

csplit

Split a file into sections determined by context lines.

Syntax

csplit [OPTION]… FILE PATTERN…

Common Options

-b, –suffix-format=FORMAT use sprintf FORMAT instead of %02d.
-f, –prefix=PREFIX use PREFIX instead of ‘xx’.
-k, –keep-files do not remove output files on errors.
-n, –digits=DIGITS use specified number of digits instead of 2.
-s, –quiet, –silent do not print counts of output file sizes.
-z, –elide-empty-files remove empty output files.
–help display a help message and exit.
–version output version information and exit.

csplit reads standard input if FILE is specified as a dash (“-“). Each PATTERN may be:

INTEGER copy up to but not including specified line number.
/REGEXP/[OFFSET] copy up to but not including a matching line.
%REGEXP%[OFFSET] skip to, but not including a matching line.
{INTEGER} repeat the previous pattern specified number of times.
{*} repeat the previous pattern as many times as possible.

A line OFFSET is a required ‘+’ or ‘-‘ followed by a positive integer.