A filter is a computer program or subroutine to process a stream, producing another stream. While a single filter can be used individually, they are frequently strung together to form a pipeline.
Some operating systems such as Unix are rich with filter programs. Windows 7 and later are also rich with filters, as they include Windows PowerShell. In comparison, however, few filters are built into cmd.exe (the original command-line interface of Windows), most of which have significant enhancements relative to the similar filter commands that were available in MS-DOS. OS X includes filters from its underlying Unix base but also has Automator, which allows filters (known as "Actions") to be strung together to form a pipeline.
Unix
In Unix and Unix-like operating systems, a filter is a program that gets most of its data from its standard input (the main input stream) and writes its main results to its standard output (the main output stream). Auxiliary input may come from command line flags or configuration files, while auxiliary output may go to standard error. The command syntax for getting data from a device or file other than standard input is the input operator (<
). Similarly, to send data to a device or file other than standard output is the output operator (>
). To append data lines to an existing output file, one can use the append operator (>>
). Filters may be strung together into a pipeline with the pipe operator ("|
"). This operator signifies that the main output of the command to the left is passed as main input to the command on the right.
The Unix philosophy encourages combining small, discrete tools to accomplish larger tasks. The classic filter in Unix is Ken Thompson's grep, which Doug McIlroy cites as what "ingrained the tools outlook irrevocably" in the operating system, with later tools imitating it. grep at its simplest prints any lines containing a character string to its output. The following is an example:
cut -d : -f 1 /etc/passwd | grep foo
This finds all registered users that have "foo" as part of their username by using the cut command to take the first field (username) of each line of the Unix system password file and passing them all as input to grep, which searches its input for lines containing the character string "foo" and prints them on its output.
Common Unix filter programs are: cat, cut, grep, head, sort, tail, and uniq. Programs like awk and sed can be used to build quite complex filters because they are fully programmable. Unix filters can also be used by Data scientists to get a quick overview about a file based dataset.
List of Unix filter programs
- awk
- cat
- comm
- compress
- cut
- expand
- fold
- grep
- head
- nl
- paste
- perl
- pr
- sed
- sh
- sort
- split
- strings
- tac
- tail
- tee
- tr
- uniq
- wc
- zcat
DOS
Two standard filters from the early days of DOS-based computers are find and sort.
Examples:
find "keyword" < inputfilename > outputfilename sort "keyword" < inputfilename > outputfilename find /v "keyword" < inputfilename | sort > outputfilename
Such filters may be used in batch files (*.bat, *.cmd etc.).
For use in the same command shell environment, there are many more filters available than those built into Windows. Some of these are freeware, some shareware and some are commercial programs. A number of these mimic the function and features of the filters in Unix. Some filtering programs have a graphical user interface (GUI) to enable users to design a customized filter to suit their special data processing and/or data mining requirements.
Windows
Windows Command Prompt inherited MS-DOS commands, improved some and added a few. For example, Windows Server 2003 features six command-line filters for modifying Active Directory that can be chained by piping: DSAdd, DSGet, DSMod, DSMove, DSRm and DSQuery.
Windows PowerShell adds an entire host of filters known as "cmdlets" which can be chained together with a pipe, except a few simple ones, e.g. Clear-Screen
. The following example gets a list of files in the C:\Windows
folder, gets the size of each and sorts the size in ascending order. It shows how three filters (Get-ChildItem
, ForEach-Object
and Sort-Object
) are chained with pipes.
Get-ChildItem C:\Windows | ForEach-Object { $_.length } | Sort-Object -Ascending
References
- McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139.
- Data Analysis with the Unix Shell Archived 2016-01-22 at the Wayback Machine - Bernd Zuther, comSysto GmbH, 2013
- Holme, Dan; Thomas, Orin (2004). Managing and maintaining a Microsoft Windows Server 2003 environment : exam 70-290. Redmond, WA: Microsoft Press. pp. 3|17—3|26. ISBN 9780735614376.