Misplaced Pages

Xargs: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editContent deleted Content addedVisualWikitext
Revision as of 22:46, 13 February 2012 edit209.51.180.226 (talk) If you short an article by more than 50% please discuss before editing← Previous edit Latest revision as of 17:21, 1 January 2025 edit undoKoloblicin (talk | contribs)9 editsm Examples: added page link 
(150 intermediate revisions by 88 users not shown)
Line 1: Line 1:
{{short description|Standard UNIX utility}}
{{multiple issues|cleanup=September 2011|lead too long=September 2011}}
{{lowercase title|title=xargs}}
{{NOT|date=September 2011}}
{{Infobox software
{{lowercase|title=xargs}}
| name = xargs
'''xargs''' is a command on ] and most ] operating systems used to build and execute command lines from ]. Under the ] before version 2.6.23, arbitrarily long lists of parameters could not be passed to a command,<ref></ref> so xargs breaks the list of arguments into sublists small enough to be acceptable.
| logo =
| screenshot =
| screenshot size =
| caption =
| author =
| developer = Various ] and ] developers
| released =
| latest release version =
| latest release date =
| operating system = ], ], ], ]
| platform = ]
| genre = ]
| license =
| website =
}}
'''xargs''' (short for "extended arguments")<ref>{{Cite web|url=http://www.roesler-ac.de/wolfram/acro/all.htm|title=The Unix Acronym List: The Complete List|website=www.roesler-ac.de|access-date=2020-04-12}}</ref> is a ] on ] and most ] ]s used to build and execute commands from ]. It converts input from standard input into arguments to a command.


Some commands such as <code>]</code> and <code>]</code> can take input either as command-line arguments or from the standard input. However, others such as <code>]</code> and <code>]</code> can only take input as arguments, which is why '''xargs''' is necessary.
For example, commands like:


A port of an older version of GNU {{Mono|xargs}} is available for ] as part of the ] collection of ] ] ] of common GNU Unix-like utilities.<ref>{{Cite web|url=http://unxutils.sourceforge.net/|title=Native Win32 ports of some GNU utilities|website=unxutils.sourceforge.net}}</ref> A ground-up rewrite named {{Mono|wargs}} is part of the open-source TextTools<ref>{{Cite web|url=https://github.com/idigdoug/TextTools|title=Text processing tools for Windows}}</ref> project. The {{Mono|xargs}} command has also been ported to the ] operating system.<ref>{{cite web |title=IBM System i Version 7.2 Programming Qshell |language=en |author=IBM |author-link=IBM |url=https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_74/rzahz/rzahzpdf.pdf?view=kc |access-date=2020-09-05 }}</ref>
<source lang="bash">
rm /path/*
</source>
or
<source lang="bash">
rm `find /path -type f`
</source>
will fail with an error message of "Argument list too long" if there are too many files in <code>/path</code>.

However the version below (functionally equivalent to <code>rm `find /path -type f`</code>) will not fail:
<source lang="bash">
find /path -type f -print0 | xargs -0 rm
</source>
In the above example, the ] feeds the input of <code>xargs</code> with a long list of file names. <code>xargs</code> then splits this list into sublists and calls <code>rm</code> once for every sublist.

The previous example is more efficient than this functionally equivalent version which calls <code>rm</code> once ''for every'' single file:
<source lang="bash">
find /path -type f -exec rm '{}' \;
</source>

Note however that with modern versions of <code>find</code>, the following variant does the same thing as the <code>xargs</code> version:
<source lang="bash">
find /path -type f -exec rm '{}' +
</source>

'''xargs''' often covers the same functionality as the ] (`) feature of many ], but is more flexible and often also safer, especially if there are blanks or special characters in the input. It is a good companion for commands that output long lists of files like ], ] and ], but only if you use -0, since xargs without -0 deals badly with file names containing ', " and space. ] is the perfect companion to ], ] and ] if file names may contain ', " and space (newline still requires -0).


==Examples== ==Examples==
One use case of the '''xargs''' command is to remove a list of files using the ] command. ] systems have an {{tt|ARG_MAX}} for the maximum total length of the command line,<ref>{{cite web|url=https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Argument-list-too-long|title=GNU Core Utilities Frequently Asked Questions|access-date=December 7, 2015}}</ref><ref>{{cite web |title=The maximum length of arguments for a new process |url=https://www.in-ulm.de/~mascheck/various/argmax/ |website=www.in-ulm.de}}</ref> so the command may fail with an error message of "Argument list too long" (meaning that the exec system call's limit on the length of a command line was exceeded): <syntaxhighlight lang="bash" inline>rm /path/*</syntaxhighlight> or <syntaxhighlight lang="bash" inline>rm $(find /path -type f)</syntaxhighlight>. (The latter invocation is incorrect, as it may expand ] in the output.)


This can be rewritten using the <code>xargs</code> command to break the list of arguments into sublists small enough to be acceptable:
<source lang="bash">
find . -name "*.foo" | xargs grep bar
</source>


<syntaxhighlight lang="console">
The above is equivalent to:
$ find /path -type f -print | xargs rm
</syntaxhighlight>
In the above example, the ] feeds the input of <code>xargs</code> with a long list of file names. <code>xargs</code> then splits this list into sublists and calls <code>rm</code> once for every sublist.


Some implementations of '''xargs''' can also be used to parallelize operations with the <code>-P maxprocs</code> argument to specify how many parallel processes should be used to execute the commands over the input argument lists. However, the output streams may not be synchronized. This can be overcome by using an <code>--output file</code> argument where possible, and then combining the results after processing. The following example queues 24 processes and waits on each to finish before launching another.
<source lang="bash">
<syntaxhighlight lang="console">
grep bar `find . -name "*.foo"`
$ find /path -name '*.foo' | xargs -P 24 -I '{}' /cpu/bound/process '{}' -o '{}'.out
</source>
</syntaxhighlight>


'''xargs''' often covers the same functionality as the ''command substitution'' feature of many ], denoted by the ] notation (<code>`...`</code> or <code>$(...)</code>). '''xargs''' is also a good companion for commands that output long lists of files such as <code>]</code>, <code>]</code> and <code>]</code>, but only if one uses <code>-0</code> (or equivalently <code>--null</code>), since <code>xargs</code> without <code>-0</code> deals badly with file names containing <code>'</code>, <code>"</code> and space. ] is a similar tool that offers better compatibility with ], ] and ] when file names may contain <code>'</code>, <code>"</code>, and space (newline still requires <code>-0</code>).
Note that the above command uses backticks (<code>`</code>), not single quotes (<code>'</code>). It searches all files in the current ] and its subdirectories which end in <code>.foo</code> for occurrences of the ] <code>bar</code>. These commands will not work as expected if there are whitespace characters, including ]s, in the filenames. In order to avoid this limitation one may use:


==Placement of arguments==
<source lang="bash">
==={{code|-I}} option: single argument===
find . -name "*.foo" -print0 | xargs -0 grep bar
The '''xargs''' command offers options to insert the listed arguments at some position other than the end of the command line. The <code>-I</code> option to '''xargs''' takes a string that will be replaced with the supplied input before the command is executed. A common choice is <code>%</code>.
</source>
<syntaxhighlight lang="shell-session">
$ mkdir ~/backups
$ find /path -type f -name '*~' -print0 | xargs -0 -I % cp -a % ~/backups
</syntaxhighlight>


The string to replace may appear multiple times in the command part. Using {{code|-I}} at all limits the number of lines used each time to one.
The above command uses GNU specific extensions to <code>find</code> and <code>xargs</code> to separate filenames using the ];


===Shell trick: any number===
Another way to achieve a similar effect is to use a shell as the launched command, and deal with the complexity in that shell, for example:
<syntaxhighlight lang="shell-session">
$ mkdir ~/backups
$ find /path -type f -name '*~' -print0 | xargs -0 sh -c 'for filename; do cp -a "$filename" ~/backups; done' sh
</syntaxhighlight>


The word {{code|sh}} at the end of the line is for the ] {{code|sh -c}} to fill in for {{code|$0}}, the "executable name" part of the positional parameters (argv). If it weren't present, the name of the first matched file would be instead assigned to <code>$0</code> and the file wouldn't be copied to <code>~/backups</code>. One can also use any other word to fill in that blank, {{code|my-xargs-script}} for example.
<source lang="bash">
find . -name "*.foo" -print0 | xargs -0 -t -r vi
</source>


Since {{code|cp}} accepts multiple files at once, one can also simply do the following:
The above command is similar to the former one, but launches the ] editor for each of the files. The <code>-t</code> prints the command to stderr before issuing it. The <code>-r</code> is a GNU extension that tells <code>xargs</code> not to run the command if no input was received.
<syntaxhighlight lang="shell-session">
$ find /path -type f -name '*~' -print0 | xargs -0 sh -c 'if ; then cp -a "$@" ~/backup; fi' sh
</syntaxhighlight>
This script runs {{code|cp}} with all the files given to it when there are any arguments passed. Doing so is more efficient since only one invocation of {{code|cp}} is done for each invocation of {{code|sh}}.


==Separator problem==
Many Unix utilities are line-oriented. These may work with <code>xargs</code> as long as the lines do not contain <code>'</code>, <code>"</code>, or a space. Some of the Unix utilities can use ] as record separator (e.g. ] (requires <code>-0</code> and <code>\0</code> instead of <code>\n</code>), <code>]</code> (requires using <code>-0</code>), <code>]</code> (requires using <code>-print0</code>), <code>]</code> (requires <code>-z</code> or <code>-Z</code>), <code>]</code> (requires using <code>-z</code>)). Using <code>-0</code> for <code>xargs</code> deals with the problem, but many Unix utilities cannot use NUL as separator (e.g. <code>]</code>, <code>]</code>, <code>]</code>, <code>]</code>, <code>]</code>, <code>] -v</code>, <code>]</code>, <code>]</code>).


But often people forget this and assume <code>xargs</code> is also line-oriented, which is '''not''' the case (per default <code>xargs</code> separates on newlines '''and''' blanks within lines, substrings with blanks must be single- or double-quoted).
<source lang="bash">
find . -name "*.foo" -print0 | xargs -0 -I {} mv {} /tmp/trash
</source>


The separator problem is illustrated here:
The above command uses <code>-I</code> to tell <code>xargs</code> to replace <code>{}</code> with the argument list. Note that not all versions of <code>xargs</code> supports the <code>{}</code> syntax. In those cases you may specify a string after <code>-I</code> that will be replaced, e.g.
<syntaxhighlight lang="bash">
# Make some targets to practice on
touch important_file
touch 'not important_file'
mkdir -p '12" records'


find . -name not\* | tail -1 | xargs rm
find \! -name . -type d | tail -1 | xargs rmdir
</syntaxhighlight>
Running the above will cause <code>important_file</code> to be removed but will remove neither the directory called <code>12" records</code>, nor the file called <code>not important_file</code>.


The proper fix is to use the GNU-specific <code>-print0</code> option, but <code>tail</code> (and other tools) do not support NUL-terminated strings:
<source lang="bash">
<syntaxhighlight lang="bash">
find . -name "*.foo" -print0 | xargs -0 -I xxx mv xxx /tmp/trash
# use the same preparation commands as above
</source>
find . -name not\* -print0 | xargs -0 rm
find \! -name . -type d -print0 | xargs -0 rmdir
</syntaxhighlight>


When using the <code>-print0</code> option, entries are separated by a null character instead of an end-of-line. This is equivalent to the more verbose command:<syntaxhighlight lang="bash" inline>find . -name not\* | tr \\n \\0 | xargs -0 rm</syntaxhighlight> or shorter, by switching <code>xargs</code> to (non-POSIX) '''line-oriented mode''' with the <code>-d</code> (delimiter) option: <syntaxhighlight lang="bash" inline>find . -name not\* | xargs -d '\n' rm</syntaxhighlight>
The above command uses string <code>xxx</code> instead of <code>{}</code> as the argument list marker.


but in general using <code>-0</code> with <code>-print0</code> should be preferred, since newlines in filenames are still a problem.


GNU <code>]</code> is an alternative to <code>xargs</code> that is designed to have the same options, but is line-oriented. Thus, using GNU Parallel instead, the above would work as expected.<ref>. . Accessed February 2012.</ref>
<source lang="bash">
find . -maxdepth 1 -type f -name "*.ogg" -print0 | xargs -0 -r cp -v -p --target-directory=/home/media
</source>


For Unix environments where <code>xargs</code> does not support the <code>-0</code> nor the {{code|-d}} option (e.g. Solaris, AIX), the POSIX standard states that one can simply backslash-escape every character:<syntaxhighlight lang="bash" inline>find . -name not\* | sed 's/\(.\)/\\\1/g' | xargs rm</syntaxhighlight>.<ref>{{man|1|xargs|SUS}}</ref> Alternatively, one can avoid using xargs at all, either by using GNU parallel or using the {{code|-exec ... +}} functionality of {{code|find}}.
The command above does the same as:


==Operating on a subset of arguments at a time==
<source lang="bash">
One might be dealing with commands that can only accept one or maybe two arguments at a time. For example, the <code>diff</code> command operates on two files at a time. The <code>-n</code> option to <code>xargs</code> specifies how many arguments at a time to supply to the given command. The command will be invoked repeatedly until all input is exhausted. Note that on the last invocation one might get fewer than the desired number of arguments if there is insufficient input. Use <code>xargs</code> to break up the input into two arguments per line:
cp -v -p *.ogg /home/media
<syntaxhighlight lang="shell-session">
</source>
$ echo {0..9} | xargs -n 2
0 1
2 3
4 5
6 7
8 9
</syntaxhighlight>


In addition to running based on a specified number of arguments at a time, one can also invoke a command for each line of input with the <code>-L 1</code> option. One can use an arbitrary number of lines at a time, but one is most common. Here is how one might <code>diff</code> every git commit against its parent.<ref>{{cite web|url=http://offbytwo.com/2011/06/26/things-you-didnt-know-about-xargs.html|title=Things you (probably) didn't know about xargs|author=Cosmin Stejerean|access-date=December 7, 2015}}</ref>
however, the former command which uses <code>find</code>/<code>xargs</code>/<code>cp</code> is more resource efficient and will not halt with an error if the number of files is too large for the <code>cp</code> command to handle. Another way to do it (choosing where to put your arguments) is:
<syntaxhighlight lang="shell-session">
$ git log --format="%H %P" | xargs -L 1 git diff
</syntaxhighlight>


==Encoding problem==
<source lang="bash">
The argument separator processing of <code>xargs</code> is not the only problem with using the <code>xargs</code> program in its default mode. Most Unix tools which are often used to manipulate filenames (for example <code>sed</code>, <code>basename</code>, <code>sort</code>, etc.) are text processing tools. However, Unix path names are not really text. Consider a path name /aaa/bbb/ccc. The /aaa directory and its bbb subdirectory can in general be created by different users with different environments. That means these users could have a different locale setup, and that means that aaa and bbb do not even necessarily have to have the same character encoding. For example, aaa could be in UTF-8 and bbb in Shift JIS. As a result, an absolute path name in a Unix system may not be correctly processable as text under a single character encoding. Tools which rely on their input being text may fail on such strings.
find . -maxdepth 1 -type f -name "*.ogg" -print0 | xargs -0 -I MYFILES cp MYFILES /home/media
</source>


One workaround for this problem is to run such tools in the C locale, which essentially processes the bytes of the input as-is. However, this will change the behavior of the tools in ways the user may not expect (for example, some of the user's expectations about case-folding behavior may not be met).
The <code>-I</code> in the above command tells <code>xargs</code> what replacement string you want to use (otherwise it adds the arguments to the end of the command). You can also use <code>-L</code> to limit the number of arguments. If you do that, the command will be run repeatedly until it is out of arguments. Thus, <code>-L1</code> runs the command once for each argument (needed for tools like tar and such).

==The separator problem==

Many UNIX utilities are line oriented. These may work with xargs as long as the lines do not contain <tt>'</tt>, <tt>"</tt> or space. Some of the UNIX utilities can use ] as record separator (e.g. ] (requires -0 and \0 instead of \n), ] (requires using -0), ] (requires using <tt>-print0</tt>), ] (requires <tt>-z</tt> or <tt>-Z</tt>), ] (requires using <tt>-z</tt>)). Using <tt>-0</tt> for <tt>xargs</tt> deals with the problem, but many UNIX utilities cannot use NULL as separator (e.g. ], ], ], ], ], ] -v, ], ]).

But often people forget this and assume xargs is also line oriented.<ref></ref>

The separator problem is illustrated here:

<source lang="bash">
touch important_file
touch 'not important_file'
find -name not\* | tail | xargs rm
mkdir -p '12" records'
find \! -name . -type d | tail | xargs rmdir
</source>

Running the above will cause <tt>important_file</tt> to be removed and will remove neither the directory called <tt>12" records</tt>, nor the file called <tt>not important_file</tt>.

The proper fix is to use <tt>find -print0</tt>, but <tt>tail</tt> (and other tools) do not support NULL terminated strings:

<source lang="bash">
touch important_file
touch 'not important_file'
find -name not\* -print0 | xargs -0 rm
mkdir -p '12" records'
find \! -name . -print0 | xargs -0 rmdir
</source>

When using the syntax <tt>find -print0</tt>, entries are separated by a null character instead of a end-of-line. This is equivalent to the more verbose command:

<source lang="bash">
find -name not\* | tr \\n \\0 | xargs -0 rm
</source>

] is an alternative to xargs that is designed to have the same options, but be line oriented. Thus, using ] instead, the above would work as expected.<ref></ref>

For Unix environments where xargs does not support the -0 option (e.g. Solaris), the following can not be used as it does not deal with ' and " (] would work on Solaris, though):

<source lang="bash">
find -name not\* | sed 's/ /\\ /g' | xargs rm
</source>


==References== ==References==
{{reflist}} {{Reflist}}


==External links== ==External links==
{{Wikibooks|Guide to Unix|Commands}}
*{{man|cu|xargs|SUS|construct argument lists and invoke utility}}
* {{man|cu|xargs|SUS|construct argument lists and invoke utility}}


=== Manual pages === ===Manual pages===
*{{man/format|1|xargs|http://www.gnu.org/software/findutils/manual/html_node/find_html/Invoking-xargs.html||] ] reference}} * {{man/format|1|xargs|https://www.gnu.org/software/findutils/manual/html_node/find_html/Invoking-xargs.html||] ] reference}}
*{{man|1|xargs|FreeBSD|construct argument list(s) and execute utility}} * {{man|1|xargs|FreeBSD|construct argument list(s) and execute utility}}
*{{man|1|xargs|NetBSD|construct argument list(s) and execute utility}} * {{man|1|xargs|NetBSD|construct argument list(s) and execute utility}}
*{{man|1|xargs|OpenBSD|construct argument list(s) and execute utility}} * {{man|1|xargs|OpenBSD|construct argument list(s) and execute utility}}
*{{man|1|xargs|Solaris|construct argument lists and invoke utility}} * {{man|1|xargs|Solaris|construct argument lists and invoke utility}}


{{unix commands}} {{Unix commands}}


] ]
] ]
]

]
]
]
]
]
]
]
]
]
]

Latest revision as of 17:21, 1 January 2025

Standard UNIX utility
xargs
Developer(s)Various open-source and commercial developers
Operating systemUnix, Unix-like, Plan 9, IBM i
PlatformCross-platform
TypeCommand

xargs (short for "extended arguments") is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.

Some commands such as grep and awk can take input either as command-line arguments or from the standard input. However, others such as cp and echo can only take input as arguments, which is why xargs is necessary.

A port of an older version of GNU xargs is available for Microsoft Windows as part of the UnxUtils collection of native Win32 ports of common GNU Unix-like utilities. A ground-up rewrite named wargs is part of the open-source TextTools project. The xargs command has also been ported to the IBM i operating system.

Examples

One use case of the xargs command is to remove a list of files using the rm command. POSIX systems have an ARG_MAX for the maximum total length of the command line, so the command may fail with an error message of "Argument list too long" (meaning that the exec system call's limit on the length of a command line was exceeded): rm /path/* or rm $(find /path -type f). (The latter invocation is incorrect, as it may expand globs in the output.)

This can be rewritten using the xargs command to break the list of arguments into sublists small enough to be acceptable:

$ find /path -type f -print | xargs rm

In the above example, the find utility feeds the input of xargs with a long list of file names. xargs then splits this list into sublists and calls rm once for every sublist.

Some implementations of xargs can also be used to parallelize operations with the -P maxprocs argument to specify how many parallel processes should be used to execute the commands over the input argument lists. However, the output streams may not be synchronized. This can be overcome by using an --output file argument where possible, and then combining the results after processing. The following example queues 24 processes and waits on each to finish before launching another.

$ find /path -name '*.foo' | xargs -P 24 -I '{}' /cpu/bound/process '{}' -o '{}'.out

xargs often covers the same functionality as the command substitution feature of many shells, denoted by the backquote notation (`...` or $(...)). xargs is also a good companion for commands that output long lists of files such as find, locate and grep, but only if one uses -0 (or equivalently --null), since xargs without -0 deals badly with file names containing ', " and space. GNU Parallel is a similar tool that offers better compatibility with find, locate and grep when file names may contain ', ", and space (newline still requires -0).

Placement of arguments

-I option: single argument

The xargs command offers options to insert the listed arguments at some position other than the end of the command line. The -I option to xargs takes a string that will be replaced with the supplied input before the command is executed. A common choice is %.

$ mkdir ~/backups
$ find /path -type f -name '*~' -print0 | xargs -0 -I % cp -a % ~/backups

The string to replace may appear multiple times in the command part. Using -I at all limits the number of lines used each time to one.

Shell trick: any number

Another way to achieve a similar effect is to use a shell as the launched command, and deal with the complexity in that shell, for example:

$ mkdir ~/backups
$ find /path -type f -name '*~' -print0 | xargs -0 sh -c 'for filename; do cp -a "$filename" ~/backups; done' sh

The word sh at the end of the line is for the POSIX shell sh -c to fill in for $0, the "executable name" part of the positional parameters (argv). If it weren't present, the name of the first matched file would be instead assigned to $0 and the file wouldn't be copied to ~/backups. One can also use any other word to fill in that blank, my-xargs-script for example.

Since cp accepts multiple files at once, one can also simply do the following:

$ find /path -type f -name '*~' -print0 | xargs -0 sh -c 'if ; then cp -a "$@" ~/backup; fi' sh

This script runs cp with all the files given to it when there are any arguments passed. Doing so is more efficient since only one invocation of cp is done for each invocation of sh.

Separator problem

Many Unix utilities are line-oriented. These may work with xargs as long as the lines do not contain ', ", or a space. Some of the Unix utilities can use NUL as record separator (e.g. Perl (requires -0 and \0 instead of \n), locate (requires using -0), find (requires using -print0), grep (requires -z or -Z), sort (requires using -z)). Using -0 for xargs deals with the problem, but many Unix utilities cannot use NUL as separator (e.g. head, tail, ls, echo, sed, tar -v, wc, which).

But often people forget this and assume xargs is also line-oriented, which is not the case (per default xargs separates on newlines and blanks within lines, substrings with blanks must be single- or double-quoted).

The separator problem is illustrated here:

# Make some targets to practice on
touch important_file
touch 'not important_file'
mkdir -p '12" records'
find . -name not\* | tail -1 | xargs rm
find \! -name . -type d | tail -1 | xargs rmdir

Running the above will cause important_file to be removed but will remove neither the directory called 12" records, nor the file called not important_file.

The proper fix is to use the GNU-specific -print0 option, but tail (and other tools) do not support NUL-terminated strings:

# use the same preparation commands as above
find . -name not\* -print0 | xargs -0 rm
find \! -name . -type d -print0 | xargs -0 rmdir

When using the -print0 option, entries are separated by a null character instead of an end-of-line. This is equivalent to the more verbose command:find . -name not\* | tr \\n \\0 | xargs -0 rm or shorter, by switching xargs to (non-POSIX) line-oriented mode with the -d (delimiter) option: find . -name not\* | xargs -d '\n' rm

but in general using -0 with -print0 should be preferred, since newlines in filenames are still a problem.

GNU parallel is an alternative to xargs that is designed to have the same options, but is line-oriented. Thus, using GNU Parallel instead, the above would work as expected.

For Unix environments where xargs does not support the -0 nor the -d option (e.g. Solaris, AIX), the POSIX standard states that one can simply backslash-escape every character:find . -name not\* | sed 's/\(.\)/\\\1/g' | xargs rm. Alternatively, one can avoid using xargs at all, either by using GNU parallel or using the -exec ... + functionality of find.

Operating on a subset of arguments at a time

One might be dealing with commands that can only accept one or maybe two arguments at a time. For example, the diff command operates on two files at a time. The -n option to xargs specifies how many arguments at a time to supply to the given command. The command will be invoked repeatedly until all input is exhausted. Note that on the last invocation one might get fewer than the desired number of arguments if there is insufficient input. Use xargs to break up the input into two arguments per line:

$ echo {0..9} | xargs -n 2
0 1
2 3
4 5
6 7
8 9

In addition to running based on a specified number of arguments at a time, one can also invoke a command for each line of input with the -L 1 option. One can use an arbitrary number of lines at a time, but one is most common. Here is how one might diff every git commit against its parent.

$ git log --format="%H %P" | xargs -L 1 git diff

Encoding problem

The argument separator processing of xargs is not the only problem with using the xargs program in its default mode. Most Unix tools which are often used to manipulate filenames (for example sed, basename, sort, etc.) are text processing tools. However, Unix path names are not really text. Consider a path name /aaa/bbb/ccc. The /aaa directory and its bbb subdirectory can in general be created by different users with different environments. That means these users could have a different locale setup, and that means that aaa and bbb do not even necessarily have to have the same character encoding. For example, aaa could be in UTF-8 and bbb in Shift JIS. As a result, an absolute path name in a Unix system may not be correctly processable as text under a single character encoding. Tools which rely on their input being text may fail on such strings.

One workaround for this problem is to run such tools in the C locale, which essentially processes the bytes of the input as-is. However, this will change the behavior of the tools in ways the user may not expect (for example, some of the user's expectations about case-folding behavior may not be met).

References

  1. "The Unix Acronym List: The Complete List". www.roesler-ac.de. Retrieved 2020-04-12.
  2. "Native Win32 ports of some GNU utilities". unxutils.sourceforge.net.
  3. "Text processing tools for Windows".
  4. IBM. "IBM System i Version 7.2 Programming Qshell" (PDF). Retrieved 2020-09-05.
  5. "GNU Core Utilities Frequently Asked Questions". Retrieved December 7, 2015.
  6. "The maximum length of arguments for a new process". www.in-ulm.de.
  7. Differences Between xargs and GNU Parallel. GNU.org. Accessed February 2012.
  8. xargs – Shell and Utilities Reference, The Single UNIX Specification, Version 4 from The Open Group
  9. Cosmin Stejerean. "Things you (probably) didn't know about xargs". Retrieved December 7, 2015.

External links

Manual pages

Unix command-line interface programs and shell builtins
File system
Processes
User environment
Text processing
Shell builtins
Searching
Documentation
Software development
Miscellaneous
Categories: