find and xargs
字数 2.4k 阅读 12 评论 6
find is one of the most frequently used command in Linux.
The main purpose of find
is to find files with specified condition like filename, file type, etc.
By default, find
will search files in specified directory recursively.
- Classic usage pattern
- Common options
- find and xargs
Syntax
find [path.] [-options] [file_pattern] [-action]
example
$ find /data -type f -print
Dissection of a find expression:
find start_dir
-options
matching_criteria
-action_to_perform_on_results
Examples
1. List all files in current and sub directories
$ find
.
./abc.txt
./subdir
./subdir/main.php
./test.php
Note: the command is same as find .
or find . -print
, when .
is omitted, find
will search in current directory.
2. List all files in specific directories
$ find /data
3. Searches file by name
$ find /data -name sql.dat
4. Searches file by name and ignore case
$ find /data -iname "*.ExE"
5. Invert match
$ find ./project -not -name "*.log"
or
$ ind ./project ! -name "*.log"
6. Combine multiple search criterias
-a: and
-o: or
-not:
- And
$ find ./project -name 'test*' ! -name '*.php'
- Or
$ find -name '*.php' -o -name '*.txt'
7. Search hidden files:
$ find ~ -type f -name ".*"
8. List out the found files
$ find . -exec ls -ld {} \;
-exec
command must be terminated with a ; to avoid interpretion by the shell (usually we need to type ; or +;)
9. Redirect error message to /dev/null
This is useful when you don’t want to see those error messages:
e.g. If you search files using your account in root directory /, you will be prompted a lot of error message telling you Permission denied
.
like this:
$ find / -name "*.conf"
/sbin/generate-modprobe.conf
find: /tmp/orbit-root: Permission denied
find: /tmp/ssh-gccBMp5019: Permission denied
find: /tmp/keyring-5iqiGo: Permission denied
find: /var/log/httpd: Permission denied
find: /var/log/ppp: Permission denied
We can avoid those messages by redirecting them to /dev/null
$ find / -name "*.conf" 2 > /dev/null
Or, we can use GNU’s -readable
option:
$ find / -type d -name ! -readable -prune -o -print
If you want keep other errors but not ‘permission denied’:
$ find . -name "openssl" 2>&1 | sed '/Permission denied/d;'
/or
$ find . 2>&1 > files_and_folders | grep -v 'Permission denied' >&2
Common Options
-name Find files with filename
(1) Find filename in /dir and its subdir
$ find /dir -name filename
(2) Find all files with .c extension in current dir and its subdir
$ find . -name "*.c"
(3) Limit find in specified directories
$ find /usr /home -name some.conf -type f
-perm
Find files with certain permission
$ find . -perm 755 –print
Find readonly files
$ find /etc -maxdepth 1 -perm /u=r
/etc/opt
/etc/aliases
/etc/localtime
/etc/apparmor.d
/etc/cron.hourly
Find executable files
$ find /bin -maxdepth 2 -perm /a=x
/bin/ping
/bin/less
/bin/zcat
/bin/ps
/bin/chmod
-maxdepth
Limit depth of directory traversal
$ find ./test -maxdepth 2 -name "*.png"
./app/img/avatar.png
./app/main.png
$ find ./test -maxdepth 1 -name *.png"
./app/main.png
-user
Find files belongs to a spesific user
$ find ~ -user nick
-group
Find files belongs to a spesific group
$ find /data -group gem
-mtime -n +n
Find files modified in a range of days, -n
means mtime is within n days, -m
means mtime is before m days.
$ find / -mtime +50 –mtime -100
Find files changed in last N minutes
$ find /home/alex -cmin -60
Files modified in last hour
$ find /home/alex -mmin -60
Find Accessed Files in Last 1 Hour
$ find /home/alex -amin -60
-nogroup
Find files that have no group, this means the files are not in /etc/groups.
find / –nogroup -print
-nouser
Find files that have no user, this means the files are not in /etc/passwd
find /home -nouser –print
-newer file1 ! file2
Find files that are newer than files1 but older than file2
-type
Find files that are a specific type, e.g.
b - block
d - directory
c - character
p - named pipe
l - symbolic link
f - regular file
s - socket
Examples:
(1) List all sub directories in /etc
find /etc -type d –print
(2) List all files in current directory, do not list directories.
find . ! -type d –print
(3) Find all symbolic links in /etc
find /etc -type l –print
(4) Search only files (not directories with the name matched) & only directory (not files with the name matched)
e.g.
If we have a file named abc.txt
and a directory named abc
in current directory, use find -name abc*
will list both of them.
If we want only the file to be listed:
$ find -type f -name "abc*"
If we want only the directories to be listed:
$ find -type d -name "abc*"
-size
Find files that have n block size (c means bytes)
(1) Find files of given size
$ find / -size 50M
(2) Find files in a size range
$ find / -size +50M -size -100M
(3) Find largest and smallest files
// display the 5 largest file in the current directory and its subdirectory
$ find . -type f -exec ls -s {} \; | sort -n -r | head -5
// The 5 smallest files
$ find . -type f -exec ls -s {} \; | sort -n | head -5
(4) Find empty files and directories
// empty files
$ find ~ -type f -empty
// empty directories
$ find ~ -type d -empty
-depth:
When finding files, process each directory’s contents before the directory itself
$ find / -name "CON.FILE" -depth –print
Actions
-delete
Find files named core in or below the directory /tmp and delete them, but more efficiently thanfind /tmp -name core -type f -print0 | xargs -0 /bin/rm -f
(because we avoid the need to use fork(2) and exec(2) to launch rm and we don’t need the extra xargs process)
$ find /tmp -depth -name core -type f -delete
-exec command ;
-exec command {} +
-exec
is used to execute commands on the found files, must be terminated with \;
.
$ find * -exec sh -c 'echo "{}"' \;
$ find . -type f -name "*.jpg" -exec rm -f {} +
or
$ find . -type f -name "*.jpg" -exec rm -f {} \;
Rename all files that have spaces in their filename and replace it with _
$ find . -type f -iname “*.mp3″ -exec rename “s/ /_/g” {} \;
-fls file
like -ls but write to file like -fprint.
-ls
list current file in ls -dils format on standard output
print the full file name on the standard output, followed by a null newline
-print0
print the full file name on the standard output, followed by a null character
-prune
if the file is a directory, do not descend into it. If -depth is given, false; no effect.
Since -delete implies -depth, -prune will have no effect when -delete specified.
e.g. Exclude a directory when searching (If specify -depth, -prune will be ignored).
$ find /apps -path "/apps/bin" -prune -o –print
$ find /usr/sam -path "/usr/sam/dir1" -prune -o –print
-quit
Exit immediately
find and exclude specific directories
<1> use “-prune -o -print”
$ find . -type f -name "*.cpp" -path "./tmp/*" -o -print | xargs wc -l
<2> use “! -path”
$ find . -type f -name "*.cpp" ! -path "./tmp/*" ! -path "./test/*"
<3> use “grep -v”
$ find . -name "*.go" | egrep -v "_vendor" | xargs wc -l
or
$ find . -name "*.go" | egrep -v "_vendor|test" | xargs wc -l
find and xargs
When we use find command combined with -exec
, find will send all matched files together to the commands following the -exec
.
This may cause problem as there is a limit to the length of the parameter that exec can accept in some system, in that case,
after a couple of minitue that find has run, it get overflowed. A common error for this is argument list is too long
or arguments are overflowed
.
Also, in some system -exec
will load a process (fork
) for each matched file instead of fetching all files together as an argument to the command,
therefore reduce the system performance.
This the situation where xargs
command comes for rescue.
xargs
is extremely powerful when cooperated with find
command, it fetch only a part of the matched filed from find
(not like -exec
which will fetch all at a time), process them, and then fetch another part, until there are no files left.
Examples
(1) Find all files and print the file info
find . -type f | xargs file
(2) Find core dump info and save to a fild
find / -name "core" -print | xargs echo "" >/tmp/core.log
(3) Used combined with grep
find . -type f -print | xargs grep "hostname"
(4) Delete all files that are modified 3 days before
find ./ -mtime +3 -print | xargs rm -f –r
(5) Delete all files that have size 0
find ./ -size 0 | xargs rm -f &
More about xargs
xargs
can also work with other commands, e.g. ls
Delete all files that contain digit
$ ls | grep -E '[0-9]' | xargs rm
Note this wouldn’t work if there are too many files produced by ls
and the combined length of the filenames exceeds 128 KiB,
use find
command to go round the limit.
Common options:
-a file Read file from file instead of standard input
-0 same with --null, Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special
(every character is taken literally).
-d delim Input items are terminated by the specified character. Quotes and backslash are not special; every character in the input is taken literally.
-I replace-str Replace occurrences of replace-str in the initial-arguments with names read from standard input.
-i Deprecated, use -I instead.
-x Exit if the size (see the -s option) is exceeded.
Examples:
-0
If the filename contains a special character, xargs will fail, in that case we should use -0
option
$ find -name '*.bak' -print0 | xargs -0 rm
Generates a compact listing of all the users on the system
$ cut -d: -f1 < /etc/passwd | sort | xargs echo
When we use the -I
option, each line read from the input is buffered internally.
This means that there is an upper limit on the length of input line that xargs will accept when used with the -I
option.
To work around this limitation, we can use the -s
option to increase the amount of buffer space that xargs uses,
and we can also use an extra invocation of xargs to ensure that very long lines do not occur.
For example:
ls | xargs -s 50000 echo | xargs -I '{}' -s 100000 rm '{}'
Here, the first invocation of xargs has no input line length limit because it doesn’t use the -i option.
The second invocation of xargs does have such a limit, but we have ensured that the it never encounters a line which is longer than it can handle.
This is not an ideal solution. Instead, the -i
option should not impose a line length limit, which is why this discussion appears in the BUGS section.
The problem doesn’t occur with the output of find(1) because it emits just one filename per line.
Invoking the shell from xargs
$ find . -type f | xargs bash -c 'exec vim "$@" < /dev/tty' vim
$ find /foo -maxdepth 1 -atime +366 -print0 | xargs -r0 sh -c 'mv "$@" /archive' move
Explanations:
Here, a shell is being invoked. There are two shell instances to think about.
The first is the shell which launches the xargs
command.
The second is the shell launched by xargs (in fact it will probably launch several, one after the other, depending on how many files need to be archived).
We’l refer to this second shell as a subshell.
We use the -c
option of sh, its argument is a shell command to be executed by the subshell.
Along with the rest of that command, the $@
is enclosed by single quotes to make sure it is passed to the subshell without being expanded
by the parent shell. It is also enclosed with double quotes so that the subshell will expand $@
correctly even if one of the file names contains a space or newline.
Another reason to use the sh -c construct could be to perform redirection:
$ find /usr/include -name '*.h' | xargs grep -wl mode_t | xargs -r sh -c 'exec vim "$@" < /dev/tty' vim
Notice that we use the shell builtin exec here. That’s simply because the subshell needs to do nothing once Vim
has been invoked.
Therefore instead of keeping a sh process around for no reason, we just arrange for the subshell to exec Vim
, saving an extra process creation.
Sometimes, though, it can be helpful to keep the shell process around:
$ find /foo -maxdepth 1 -atime +366 -print0 | xargs -r0 sh -c 'mv "$@" /archive || exit 255' move
Here, the shell will exit with status 255 if any mv failed. This causes xargs to stop immediately.
parallel
GNU parallel help utilize multi CPU cores to speed the processing when cooperate with xargs
command.
Assume we want to compressing files, we know that compressing a bounch of files could be time consuming operation.
You may want to compress several files at the same time, to make better use of the multiple CPU cores,
For this, you can use the GNU parallel
tool.
$ find -name '*.bak' -print0 | xargs -0 parallel gzip --
This example takes a bit of unravelling:
- find writes the pathnames of matching files, delmited by NUL bytes, to its stdout
- xargs reads files from its stdin, and assumes NUL delimiters
- the command to run is parallel gzip –
- the – tells parallel that it should run gzip on any arguments following the –, or in other words,
the – separates the command to be run from the filenames to give the command as arguments - parallel starts an instances of the command for each CPU core, and gives each instance the next filename argument;
when an instance terminates, it starts a new instance with the next filename argument, until it’s run the command for each argument
This should be much more efficient than running one gzip at a time.
Reference