>

cut and paste commands in Linux is very helpful in process text that are separated in fields, so if we need to process data that contain such text, it is recommanded to master cut and paste commands.

cut

assume we have a plain text file test.txt:

1|2|3|4|5|6|7|8|九|0
1|2|3|4|5|6|7|8|九|0
1|2|3|4|5|6|7|8|九|0
1|2|3|4|5|6|7|8|九|0
1|2|3|4|5|6|7|8|九|0
1|2|3|4|5|6|7|8|九|0

cut basics and character mode

The basic funcationality of cut command is to extract out some column data from a text file. More precisely, cut accept lines from input stream, get some column out.
There are three modes for the column.

  • -b # Each column is denoted by a byte
  • -c # Each column is denoted by a character
  • -f # Each column is denoted by a field (each field is separated by some delimiters)

As we know, each English character is a single byte in ASCII. Suppose we want to print out the third byte of each line, the expected output should be 6 twos:

$ cut -b 3 cut.txt
2
2
2
2
2
2

Correct, the result meets our expectation.

Cut Syntax

cut -[b,c,f]

cut has a very flexible sytle of specifying column or clumn range

3       # 3rd column 
3,5,8   # 3rd, 5th, 8th columns
3-5,8   # 3rd to 5th columns, and the 8th column
-3,8    # 1st to 3th colums, and the 8th column
1,3-    # 1st column, 3rd to the last columns

Byte mode in some cases may cause problems. e.g. If the file contains non-ASCII characters, by which means they are not single byte characters, -b will not work as expected.

$ cut -b 17 cut.txt

-b will only print the first byte of the 17th column instead of the 17 character of the line, in another word, it print will out the first byte of . The real content depends on how chinese character is encoded.

Character mode and field mode

-c specifies character mode. In order to print out , we can do this:

$ cut -c 17 cut.txt
九
九
九
九
九
九

Except for this, -c and -b are completely the same.

类似的,还有域模式。与字节模式以及字符模式最大的不同是,域模式可以指定单个字符作为分隔符,逐行地将文件分成若干列。比如,这里我们可以用 | 作为分隔符,输出第三列至第五列以及第九列。注意,在列模式下,分隔符也会按需输出。

$ cut -d '|' -f 3-5,9 cut.txt
3|4|5|九
3|4|5|九
3|4|5|九
3|4|5|九
3|4|5|九
3|4|5|九
补集

cut also provides --complement option to get the reverted result

$ cut -d '|' -f 3-5,9 --complement cut.txt
1|2|6|7|8|0
1|2|6|7|8|0
1|2|6|7|8|0
1|2|6|7|8|0
1|2|6|7|8|0
1|2|6|7|8|0

With --complement, it is easy to delete a column from a text file.

$ cut -d '|' -f 4 --complement cut.txt
1|2|3|5|6|7|8|九|0
1|2|3|5|6|7|8|九|0
1|2|3|5|6|7|8|九|0
1|2|3|5|6|7|8|九|0
1|2|3|5|6|7|8|九|0
1|2|3|5|6|7|8|九|0

Process continuous space characters

cut will make a mess when process a serial of continuous same characters. Fortunately, we can get help from another command tr with -s, --squeeze option.
-s option can be used to squeeze the contiguous-characters to a single occurrence.

$ who
jack :0           2016-11-08 00:07
jack pts/0        2016-11-08 00:23 (:0.0)
jack pts/1        2016-11-08 00:15 (:0.0)

After using tr -s

$ who | tr -s ' '
jack :0 2016-11-08 00:07
jack pts/0 2016-11-08 00:23 (:0.0)
jack pts/1 2016-11-08 00:15 (:0.0)

By cooperating cut with tr -s '', we can get a clearer result.

$ who | tr -s ' ' | cut -d ' ' -f 1,3,4
jack 2016-11-08 00:07
jack 2016-11-08 00:23
jack 2016-11-08 00:15

Using TAB as seprator, cut does not support multi characters as separator, so ‘\t’ will not work, to type TAB character, press Ctrl+v, then press TAB key.

$ cut -f2 -d$'\t' file

or

$ cut -d '    ' -f2 file

paste

Compared to cut, paste command is more straight and easy. It’s main usage is to

Assume we have 3 files:

$ cat paste1.txt  | $ cat paste2.txt | $ cat paste3.txt
1                 | a                | A
2                 | b                | B
3                 | c                | C

Try with paste

$ paste paste1.txt paste2.txt
1    a
2    b
3    c
$ paste paste2.txt paste1.txt
a    1
b    2
c    3
$ paste paste2.txt paste1.txt paste3.txt
a    1    A
b    2    B
c    3    C
$ paste paste2.txt paste1.txt paste3.txt | sed -n l
a\t1\tA
b\t2\tB
c\t3\tC

paste accepts multiple files as input, and stick them together line by line using ‘\tby default. If we want to custom the delimiter in the output we can use-d` option to specify.

$ paste -d '|' paste2.txt paste1.txt paste3.txt
a|1|A
b|2|B
c|3|C

Avoid temp file

If we want to stick together outputs of several different applications, we may have to write them to temporary files and use paste command later. Fortunately, we can avoid this by using Bash Process Substituation.

In short, use <(command) to simulate a temp file, and use it as the input of the paste command.

e.g.

$ paste -d '|' <(cat paste2.txt) <(cat paste1.txt) <(cat paste3.txt)
a|1|A
b|2|B
c|3|C

Reference