Week 1 --- The Shell: Finding Things

Sep 7, 2012 • Michael Hansen

Prerequisite Knowledge

This post assumes you have worked with a shell before, and understand several basic concepts:

  • You provide text commands to the shell, and it responds by performing the requested commands and (usually) printing a response. This printed response is often your goal.
  • Commands are given as a series of “words” separated by spaces. The first word is name of the command, and the following words are arguments or options to the command (conventionally prefixed with a ‘-‘). Some options can themselves take additional arguments.
  • Command options that do not take arguments can be combined as a single option (e.g., -a and -b becomes -ab)
  • The shell evaluates your command in multiple steps, first expanding special forms like `  ` (backquotes) and *
  • Unix commands consume and produce plain text. Whitespace and newlines are often significant markers
  • Commands can be chained together with pipes ‘ ’ so that the command on the right of a pipe consumes the output of its left neighbor

The “grep” Command

grep is used to search for patterns inside text files. It uses a special pattern language called regular expressions, and has many useful options. For the examples below, assume a file named myfile.txt exists in the current directory with the following contents:

"Hello, ma'am," he said.
She smiled and replied quickly in kind.
Her hellos were always quick.

We can ask grep to search myfile.txt for lines matching the pattern “hello” by running the following command:

$ grep hello myfile.txt
Her hellos were always quick.

Only the last line of the file is returned to us. Why not the first? By default, grep is case sensitive. The pattern “hello” therefore matched hellos in line 3 and not Hello in line 1. We can turn off case sensitivity with the -i option:

$ grep -i hello myfile.txt
"Hello, ma'am," he said.
Her hellos were always quick.

Now our pattern has matched both hellos and Hello. What if we only wanted the full word “hello” and not the plural “hellos?” The -w option matches whole words for us, and when combined with the -i option yields the following:

$ grep -iw hello myfile.txt
"Hello, ma'am," he said.

Other useful options include -n, which prints the line number next to each matched line, and -v, which makes grep return lines that don’t match the pattern. Combining what we’ve learned, here’s a grep command that will print lines in myfile.txt (with corresponding line numbers) that do not contain the case-insensitive pattern “hello”:

$ grep -ivn hello myfile.txt
2:She smiled and replied quickly in kind.

The “find” Command

While grep search inside files, the find command searches for files and directories themselves. Unlike grep, find searches ** recursively from the starting directory you give it. In other words, it will search sub-directories (and their sub-directories, etc.). It’s also important to keep in mind that find doesn’t use the same pattern language as grep (grep uses regular expressions).

For the examples below, imagine you have the following directory tree:

.
├── Documents
│   ├── School
│   │   ├── Philosophy Paper.doc
│   │   └── Programming Assignment 1.py
│   └── Work
│       └── Budget.xlsx
├── Pictures
│   ├── Christmas
│   │   ├── Me and Rudolph.jpg
│   │   └── Santa.png
│   └── Me.jpg
└── Videos

Running find from this directory produces a list of all the directories and files:

$ find .
.
./Pictures
./Pictures/Christmas
./Pictures/Christmas/Me and Rudolph.jpg
./Pictures/Christmas/Santa.png
./Pictures/Me.jpg
./Videos
./Documents
./Documents/Work
./Documents/Work/Budget.xlsx
./Documents/School
./Documents/School/Programming Assignment 1.py
./Documents/School/Philosophy Paper.doc

Let’s say we just want to get a list of our JPEG files (.jpg). We can use find’s -name option like so:

$ find . -name "*.jpg"
./Pictures/Christmas/Me and Rudolph.jpg
./Pictures/Me.jpg

Notice that we put the *.jpg in quotes — this is very important because the shell will expand *.jpg outside of quotes into a list of all .jpg files in the current directory. To avoid any problems, always put quotes around what you give -name.

We can tell find to only return files or directories using the -type option. Using -type f causes find to return only files:

$ find . -type f
./Pictures/Christmas/Me and Rudolph.jpg
./Pictures/Christmas/Santa.png
./Pictures/Me.jpg
./Documents/Work/Budget.xlsx
./Documents/School/Programming Assignment 1.py
./Documents/School/Philosophy Paper.doc

while -type d returns only directories:

$ find . -type d
.
./Pictures
./Pictures/Christmas
./Videos
./Documents
./Documents/Work
./Documents/School

You can restrict how far find searches down in the directory tree with the -maxdepth N option (where N = 1 means to stay in the current directory):

$ find . -maxdepth 2 -name "*.jpg"
./Pictures/Me.jpg

Combined with the -mindepth N option, you can restrict searches to particular levels of the directory tree:

$ find . -mindepth 3 -type f -name "*.jpg"
./Pictures/Christmas/Me and Rudolph.jpg

Lastly, find can also list empty files and directories using the -empty option:

$ find . -empty
./Videos

Backquote Expansion

The shell has a special feature called “backquote expansion” that provides a more flexible way to give arguments to your commands. Remember that Unix commands consume and produce plain text, so the result of a command like

$ find . -type f -name "a*"

will be a textual listing of all files in the current directory tree that start with an ‘a’. What if we want to perform a grep search on each of these files? We can’t pipe the result of find into grep, because grep will end up doing a search over the file listing itself and not the contents of the files. Instead, we can use backquote expansion to feed arguments to grep:

$ grep hello `find . -type f -name "a*"`

When the shell sees the backquotes `….`, it will run the find command inside and then give the results to grep as arguments. If we had files a1, a2, and a3 in our current directory tree, this command would be equivalent to the following:

$ grep hello a1 a2 a3

Getting Help

You can get help for a specific command by using the “man” command:

$ man grep

This will display the help page (or manual) for the “grep” command (hit ‘q’ to quit). This works for any Unix command, even “man” itself!

$ man man

After Action Review

Overall, I spent around 3 1/2 to 4 hours (including reading). I’m very familiar with Inkscape, so it was easy for me to build my concept map.