find and xargs

find is one of the most frequently used command in Linux.
The main purpose of find is to find files with specified condition like filename, file type, etc.
By default, find will search files in specified directory recursively.

Classic usage pattern
Common options
find and xargs

Syntax

find [path.] [-options] [file_pattern] [-action]

example

$ find /data -type f -print

Dissection of a find expression:

find start_dir
     -options 
     matching_criteria 
     -action_to_perform_on_results

Examples

1. List all files in current and sub directories

$ find
.
./abc.txt
./subdir
./subdir/main.php
./test.php

Note: the command is same as find . or find . -print, when . is omitted, find will search in current directory.

2. List all files in specific directories

$ find /data

3. Searches file by name

$ find /data -name sql.dat

4. Searches file by name and ignore case

$ find /data -iname "*.ExE"

5. Invert match

$ find ./project -not -name "*.log"

$ ind ./project ! -name "*.log"

6. Combine multiple search criterias

-a: and 
-o: or 
-not:

And $ find ./project -name 'test*' ! -name '*.php'
Or $ find -name '*.php' -o -name '*.txt'

7. Search hidden files:

$ find ~ -type f -name ".*"

8. List out the found files

$ find . -exec ls -ld {} \;

-exec command must be terminated with a ; to avoid interpretion by the shell (usually we need to type ; or +;)

9. Redirect error message to /dev/null

This is useful when you don’t want to see those error messages:

e.g. If you search files using your account in root directory /, you will be prompted a lot of error message telling you Permission denied.

like this:

$ find / -name "*.conf" 
/sbin/generate-modprobe.conf 
find: /tmp/orbit-root: Permission denied 
find: /tmp/ssh-gccBMp5019: Permission denied 
find: /tmp/keyring-5iqiGo: Permission denied 
find: /var/log/httpd: Permission denied 
find: /var/log/ppp: Permission denied

We can avoid those messages by redirecting them to /dev/null

$ find / -name "*.conf" 2 > /dev/null

Or, we can use GNU’s -readable option:

$ find / -type d -name ! -readable -prune -o -print

If you want keep other errors but not ‘permission denied’:

$ find . -name "openssl" 2>&1 | sed '/Permission denied/d;'

/or
$ find . 2>&1 > files_and_folders | grep -v 'Permission denied' >&2

Common Options

-name Find files with filename

(1) Find filename in /dir and its subdir

$ find /dir -name filename

(2) Find all files with .c extension in current dir and its subdir

$ find . -name "*.c"

(3) Limit find in specified directories

$ find /usr /home -name some.conf -type f

-perm

Find files with certain permission

$ find . -perm 755 –print

Find readonly files

$ find /etc -maxdepth 1 -perm /u=r
/etc/opt
/etc/aliases
/etc/localtime
/etc/apparmor.d
/etc/cron.hourly

Find executable files

$ find /bin -maxdepth 2 -perm /a=x
/bin/ping
/bin/less
/bin/zcat
/bin/ps
/bin/chmod

-maxdepth

Limit depth of directory traversal

$ find ./test -maxdepth 2 -name "*.png"
./app/img/avatar.png
./app/main.png

$ find ./test -maxdepth 1 -name *.png"
./app/main.png

-user

Find files belongs to a spesific user

$ find ~ -user nick

-group

Find files belongs to a spesific group

$ find /data -group gem

-mtime -n +n

Find files modified in a range of days, -n means mtime is within n days, -m means mtime is before m days.

$ find / -mtime +50 –mtime -100

Find files changed in last N minutes

$ find /home/alex -cmin -60

Files modified in last hour

$ find /home/alex -mmin -60

Find Accessed Files in Last 1 Hour

$ find /home/alex -amin -60

-nogroup

Find files that have no group, this means the files are not in /etc/groups.

find / –nogroup -print

-nouser

Find files that have no user, this means the files are not in /etc/passwd

find /home -nouser –print

-newer file1 ! file2

Find files that are newer than files1 but older than file2

-type

Find files that are a specific type, e.g.

b - block
d - directory
c - character
p - named pipe
l - symbolic link
f - regular file
s - socket

Examples:

(1) List all sub directories in /etc

find /etc -type d –print

(2) List all files in current directory, do not list directories.

find . ! -type d –print

(3) Find all symbolic links in /etc

find /etc -type l –print

(4) Search only files (not directories with the name matched) & only directory (not files with the name matched)

e.g.

If we have a file named abc.txt and a directory named abc in current directory, use find -name abc* will list both of them.

If we want only the file to be listed:

$ find -type f -name "abc*"

If we want only the directories to be listed:

$ find -type d -name "abc*"

-size

Find files that have n block size (c means bytes)

(1) Find files of given size

$ find / -size 50M

(2) Find files in a size range

$ find / -size +50M -size -100M

(3) Find largest and smallest files

// display the 5 largest file in the current directory and its subdirectory
$ find . -type f -exec ls -s {} \; | sort -n -r | head -5

// The 5 smallest files
$ find . -type f -exec ls -s {} \; | sort -n | head -5

(4) Find empty files and directories

// empty files
$ find ~ -type f -empty

// empty directories
$ find ~ -type d -empty

-depth:

When finding files, process each directory’s contents before the directory itself

$ find / -name "CON.FILE" -depth –print

Actions

-delete

Find files named core in or below the directory /tmp and delete them, but more efficiently than
find /tmp -name core -type f -print0 | xargs -0 /bin/rm -f
(because we avoid the need to use fork(2) and exec(2) to launch rm and we don’t need the extra xargs process)

$ find /tmp -depth -name core -type f -delete

-exec command ;
-exec command {} +

-exec is used to execute commands on the found files, must be terminated with \;.

$ find * -exec sh -c 'echo "{}"' \;

$ find . -type f -name "*.jpg" -exec rm -f {} +
or 
$ find . -type f -name "*.jpg" -exec rm -f {} \;

Rename all files that have spaces in their filename and replace it with _

$ find . -type f -iname “*.mp3″ -exec rename “s/ /_/g” {} \;

-fls file

like -ls but write to file like -fprint.

-ls

list current file in ls -dils format on standard output

-print

print the full file name on the standard output, followed by a null newline

-print0

print the full file name on the standard output, followed by a null character

-prune

if the file is a directory, do not descend into it. If -depth is given, false; no effect.
Since -delete implies -depth, -prune will have no effect when -delete specified.

e.g. Exclude a directory when searching (If specify -depth, -prune will be ignored).

$ find /apps -path "/apps/bin" -prune -o –print
$ find /usr/sam -path "/usr/sam/dir1" -prune -o –print

-quit

Exit immediately

find and exclude specific directories

<1> use “-prune -o -print”

$ find . -type f -name "*.cpp" -path "./tmp/*" -o -print | xargs wc -l

<2> use “! -path”

$ find . -type f -name "*.cpp" ! -path "./tmp/*" ! -path "./test/*"

<3> use “grep -v”

$ find . -name "*.go" | egrep -v "_vendor" | xargs wc -l

$ find . -name "*.go" | egrep -v "_vendor|test" | xargs wc -l

find and xargs

When we use find command combined with -exec, find will send all matched files together to the commands following the -exec. This may cause problem as there is a limit to the length of the parameter that exec can accept in some system, in that case,
after a couple of minitue that find has run, it get overflowed. A common error for this is argument list is too long or arguments are overflowed.

Also, in some system -exec will load a process (fork) for each matched file instead of fetching all files together as an argument to the command,
therefore reduce the system performance.

This the situation where xargs command comes for rescue.

xargs is extremely powerful when cooperated with find command, it fetch only a part of the matched filed from find
(not like -exec which will fetch all at a time), process them, and then fetch another part, until there are no files left.

Examples

(1) Find all files and print the file info

find . -type f | xargs file

(2) Find core dump info and save to a fild

find / -name "core" -print | xargs echo "" >/tmp/core.log

(3) Used combined with grep

find . -type f -print | xargs grep "hostname"

(4) Delete all files that are modified 3 days before

find ./ -mtime +3 -print | xargs rm -f –r

(5) Delete all files that have size 0

find ./ -size 0 | xargs rm -f &

More about xargs

xargs can also work with other commands, e.g. ls

Delete all files that contain digit

$ ls | grep -E '[0-9]' | xargs rm

Note this wouldn’t work if there are too many files produced by ls and the combined length of the filenames exceeds 128 KiB,
use find command to go round the limit.

Common options:

-a file             Read file from file instead of standard input


-0                  same with --null, Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special
                    (every character is taken literally).

-d delim            Input items are terminated by the specified character. Quotes and backslash are not special; every character in  the input is taken literally.

-I replace-str      Replace occurrences of replace-str in the initial-arguments with names read from standard input.

-i                  Deprecated, use -I instead.

-x                  Exit if the size (see the -s option) is exceeded.

Examples:

-0 If the filename contains a special character, xargs will fail, in that case we should use -0 option

$ find -name '*.bak' -print0 | xargs -0 rm

Generates a compact listing of all the users on the system

$ cut -d: -f1 < /etc/passwd | sort | xargs echo

When we use the -I option, each line read from the input is buffered internally.
This means that there is an upper limit on the length of input line that xargs will accept when used with the -I option.
To work around this limitation, we can use the -s option to increase the amount of buffer space that xargs uses,
and we can also use an extra invocation of xargs to ensure that very long lines do not occur.

For example:

ls | xargs -s 50000 echo | xargs -I '{}' -s 100000 rm '{}'

Here, the first invocation of xargs has no input line length limit because it doesn’t use the -i option.
The second invocation of xargs does have such a limit, but we have ensured that the it never encounters a line which is longer than it can handle.

This is not an ideal solution. Instead, the -i option should not impose a line length limit, which is why this discussion appears in the BUGS section.
The problem doesn’t occur with the output of find(1) because it emits just one filename per line.

Invoking the shell from xargs

$ find . -type f | xargs bash -c 'exec vim "$@" < /dev/tty' vim

$ find /foo -maxdepth 1 -atime +366 -print0 | xargs -r0 sh -c 'mv "$@" /archive' move

Explanations:

Here, a shell is being invoked. There are two shell instances to think about.
The first is the shell which launches the xargs command.
The second is the shell launched by xargs (in fact it will probably launch several, one after the other, depending on how many files need to be archived).
We’l refer to this second shell as a subshell.

We use the -c option of sh, its argument is a shell command to be executed by the subshell.
Along with the rest of that command, the $@ is enclosed by single quotes to make sure it is passed to the subshell without being expanded
by the parent shell. It is also enclosed with double quotes so that the subshell will expand $@ correctly even if one of the file names contains a space or newline.

Another reason to use the sh -c construct could be to perform redirection:

$ find /usr/include -name '*.h' | xargs grep -wl mode_t | xargs -r sh -c 'exec vim "$@" < /dev/tty' vim

Notice that we use the shell builtin exec here. That’s simply because the subshell needs to do nothing once Vim has been invoked.
Therefore instead of keeping a sh process around for no reason, we just arrange for the subshell to exec Vim, saving an extra process creation.

Sometimes, though, it can be helpful to keep the shell process around:

$ find /foo -maxdepth 1 -atime +366 -print0 | xargs -r0 sh -c 'mv "$@" /archive || exit 255' move

Here, the shell will exit with status 255 if any mv failed. This causes xargs to stop immediately.

parallel

GNU parallel help utilize multi CPU cores to speed the processing when cooperate with xargs command.

Assume we want to compressing files, we know that compressing a bounch of files could be time consuming operation.
You may want to compress several files at the same time, to make better use of the multiple CPU cores,
For this, you can use the GNU parallel tool.

$ find -name '*.bak' -print0 | xargs -0 parallel gzip --

This example takes a bit of unravelling:

find writes the pathnames of matching files, delmited by NUL bytes, to its stdout
xargs reads files from its stdin, and assumes NUL delimiters
the command to run is parallel gzip –
the – tells parallel that it should run gzip on any arguments following the –, or in other words,
the – separates the command to be run from the filenames to give the command as arguments
parallel starts an instances of the command for each CPU core, and gives each instance the next filename argument;
when an instance terminates, it starts a new instance with the next filename argument, until it’s run the command for each argument

This should be much more efficient than running one gzip at a time.

Reference

http://www.th7.cn/system/lin/201404/53421.shtml