Unix Basics 2¶
Hopefully you are becoming familiar with the terminal environment in Unix. Today we will examine more basics for working in Unix.
Wildcards¶
When entering file and directory names, Unix gives us a number of special ways to refer to filenames, knowing as “globbing” using “wildcard” characters. Wildcard characters that have a special meaning in the terminal include *
, \
, ?
, ^
and [ ]
and can be used to designate various patterns. One example is the *
character, which stands for any number of characters or any type. This is often used to find files with a certain extension, as *.txt
will match any file with the extension .txt
. You can also use multiple wildcards in the same expression; for instance if you want list all files containing the word “data” that ends in ”.txt” you could use ls *data*.txt
and the shell would show you all of the files matching that pattern.
Other useful wildcards include:
?
, which can be used to represent a single unknown character. For instance,?at
matchesBat
,bat
,Cat
,cat
, and many others, but notat
alone.[<characters>]
can be used to represent any one of the characters included in the square brackets. As an example,[CB]at
matchesBat
andCat
, but notbat
orcat
. You can also specify a range of characters such as [A-Z] for any capital letter, [a-z] for any lowercase letter, and [0-9] for any numeric character, or [A-Za-z0-9] for any alphanumeric character.^
negates whatever is entered. For example,[^<characters>]
matches any single character except those included in the square brackets.[^C]at
will matchBat
,bat
,cat
, and many other strings, but notCat
.
What if your expression contains one of the special wildcard characters? Like we saw with having a space in the file name, you can enter an “escape character” by preceding the character with a backslash \
. You will also need to escape wildcards in some other situations, like when using wildcard to denote a file name when using find
(see below). Thus if you want to match a sequence that includes a question mark, use, for example, hello\?.txt
.
There are more sophisticated pattern matching techniques called “regular expressions” which we will use more extensively when we talk about AWK. Getting used to using these wildcards in the terminal will make regular expressions a bit easier to understand, so be sure to spend some time experimenting with them.
Finding Files¶
To make queries of files in the filesystem, use the find
command, which has a number of sophisticated options. The syntax for find
is find <locations> <criteria>
, where <locations>
are the directories that you wish to search (including all subdirectories) and <criteria>
are the specific options that you want to search for (these are specified by command options).
Command options for find
that I use frequently include:
-name <pattern>
finds files whose name matches the given pattern. You are allowed to use wildcards in name patterns, but you must put a backslash in front of any special wildcard characters (otherwise, the shell expands the wildcards before searching the files, while we want the wildcards to remain untilfind
does its search).-iname <pattern>
same as-name
, but is case insensitive-type <type>
, target is of a specific type, common examples includef
regular file,d
directory-atime <time>
, target has a most recent access date that differs from the present date by exactly<time>
days, rounded to the next 24 hour period. You can also specify units other than days for<time>
using any ofsmhdw
(seconds, minutes, hours, days, weeks, respectively). For files that are newer than the time specified, use-<time>
, and for files that are older than the time specified, use+<time>
-mtime <time>
, works like-atime
but uses file modification date rather than access date
For example, to find files within my home directory with a name that match the name “test”, I would enter find ~ -name test -type f
into the terminal. To find all directories in my Documents
directory that were modified in the past 24 hours, use find ~/Documents -type d -mtime -1
to perform this search. Wildcards can be very useful for finding files of a certain type; to find all SAC files in my home directory I can use find ~ -type f -iname \*.sac
to find all SAC files. Note that using -iname
ensures that both files ending in .SAC
and .sac
are found, and that I put a backslash in front of the wildcard character to prevent expanding the character prior to invocation of find
.
Exercise: Practice searching for different types of files. I use this command frequently, so it is a good idea to be comfortable with its use.
Finding Text Within Files¶
We can also search for text patterns within files. This is done using grep
(short for Globally search a Regular Expression and Print). The basic syntax is grep <pattern> <files>
where you specify a single pattern and (potentially) multiple text files in which to look for that pattern. For example, if you have a list of fruits as a text file fruit.txt
, then grep apple fruit.txt
will print any lines in the file that contain the string “apple.”
As the title alludes to, the pattern specification is a regular expression and can be used to match rather complex patterns. Practice using grep
, using straight text strings as well as wildcards to make various patterns that appear in a text file. You can also use wildcards on the text files if you want to search a large number of files for the pattern. As previously mentioned, we will come back to regular expressions and will talk a bit more about grep
.
Exercise: Using a text editor, create a text file that contains several lines of text. Use grep
to find patterns, in particular to practice using various wildcards to match patterns.
Permissions¶
Back in the last lab, we noted that ls -l
gave us a rather cryptic list of characters at the beginning of each entry:
total 472
drwxr-xr-x 53 egdaub staff 1802 Nov 4 2015 MATLAB
drwxr-xr-x 2 egdaub staff 68 Aug 10 2015 awk
-rw-r--r--@ 1 egdaub staff 147712 Sep 25 2015 ceri7104_dataanalysis.docx
drwxr-xr-x 3 egdaub staff 102 Jan 7 2016 compexam
drwxr-xr-x 9 egdaub staff 306 Dec 17 2015 csh
-rw-r--r-- 1 egdaub staff 84 Aug 25 2015 data_syllabus.aux
-rw-r--r-- 1 egdaub staff 5231 Aug 25 2015 data_syllabus.log
-rw-r--r--@ 1 egdaub staff 52581 Aug 25 2015 data_syllabus.pdf
-rw-r--r-- 1 egdaub staff 15882 Aug 25 2015 data_syllabus.synctex.gz
-rw-r--r-- 1 egdaub staff 4942 Oct 19 2015 data_syllabus.tex
drwxr-xr-x 56 egdaub staff 1904 Dec 17 2015 gmt
drwxr-xr-x 77 egdaub staff 2618 Dec 7 2015 homework
drwxr-xr-x 148 egdaub staff 5032 Nov 24 2015 lectures
drwxr-xr-x 12 egdaub staff 408 Oct 19 2015 python
drwxr-xr-x 11 egdaub staff 374 Nov 19 2015 sac
drwxr-xr-x 12 egdaub staff 408 May 2 12:34 studentwork
These characters signify what are known as “permissions.” These characters tell us who is allowed to read (r
), write (w
), and execute (x
) this file. There are three sets of letters signifying this that follow the first characters, which tells us whether or not the entry is a directory): the first set of three tells us what the owner is allowed to do (for the above list, the owner is egdaub
), the second set of three tells us what the group is allowed to do (for the above list, the group is staff
), and the third set tells us what any other user is allowed to do. Thus, the sequence -rw-r--r--
signifies that the user can read and write the file, the group can read the file, and others can read the file. These are the default permissions for a newly created file. Note that for the directories, the listing is drwxr-xr-x
. The leading d
says that this is a directory, and that all can read and execute, but only the owner can write. Execute is a bit of a misnomer here because this is a directory (you cannot really “execute” a directory, it just means that said class of users can cd
into that directory and search in that directory).
These permissions can be changed using the chmod
command (Change Mode). The syntax is chmod <mode> <files>
and there are several different ways to specify the mode. You can add, remove, or set specific permissions for specific groups using the letters u
, g
, o
, and a
(for user, group, others, and all respectively), the operators +
, -
, and =
(add permission, remove permission, and set permission, respectively), and r
, w
, and x
(read, write, and excecute, respectively). Thus, to add execute privileges for the user, enter chmod u+x file.txt
into the terminal. To remove read and write privileges for others, enter chmod o-rw file.txt
into the terminal. To give everyone read, write, and execute access, enter chmod a=rwx file.txt
into the terminal. You can also specify more than one file in the command, and the specified changes will be made to all the files given in the command.
A shorthand way to specify permissions is to use an octal number to specify the binary representation of read, write, and execute as follows:
Permissions | --- |
--x |
-w- |
-wx |
r-- |
r-x |
rw- |
rwx |
Binary | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
Octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
To set owner permissions to rwx
, group to r-x
, and others to r--
, you can enter chmod 754 file.txt
(the first octal number is for the owner, the second is for the group, and the third is for others). This is shorter than specifying each one individually through multiple commands, so if you need to change many different permissions at once, this is a handy trick.
Aliases¶
You may have noticed that some of your Unix commands can get a bit long and difficult to type if you choose a number of options. If you have a command that is long and is one you use frequently, you may consider making an alias. An alias is a way to essentially define your own Unix command using some other command. For instance, if you want to always remove files using rm -i
, then you could make an alias to do this. Enter alias rm 'rm -i'
and then a carriage return. This sets the command rm
to execute rm -i
automatically. Try this, and see that it will ask you to confirm when deleting a file. To undo an alias, precede the command with a backslash (\rm
). To remove an alias, use the unalias
command (unalias rm
to remove the example here). unalias -a
will undo all aliases.
Other Shells¶
One thing to note here: the syntax for an alias in the shell that you are currently using (the Tenex C shell tcsh
) is different from some other Unix shells. Each of these different shells represent a terminal with different available commands and shortcuts. While they are all relatively similar, they are not exactly the same. Other shells available in the Mac Lab include the Bourne-again shell bash
(default on most GNU/Linux systems), the Korn shell ksh
, the Tenex C Shell, and the Z Shell zsh
. While the majority of the commands that we will use work in all shells, there are some specific ones like this that are specific to a certain type of shell.
To run a different shell, enter the appropriate command into your present terminal. It will start a new session within that shell (the current directory will remain the same). To exit, type exit
and you will return to your first shell. You can set your default shell in Terminal to be whatever you like – go to Terminal > Preferences... and set the “Shells open with” command to whatever shell you want. You need to give the full path, and all of the shells are under /bin
.
rc files¶
If you want to set up an alias for all of your terminal sessions, is there a way to avoid typing in the alias every time you start up a terminal? You can automatically set things like aliases using a startup file. For the default tcsh
in the Mac lab, the startup file is a file in your home directory ~/.tcshrc
and is executed automatically every time you start up a tcsh
session. Note that the file begins with a period – this means that this is a file that is not normally visible when you type ls
in your home directory. To see files that begin with a period, use the ls -a
option. When you ls -a
in your home directory, if you still do not see the file, then it does not exist. You can create this file in a text editor like nano
– enter nano .tcshrc
when in your home directory, and then type in any commands you would like to be executed by default on startup. Common things to put in a startup file include setting aliases, changing the terminal prompt appearance, and setting variables known as environment variables (more on that later in the semester). Try putting an alias in your startup file, restarting the terminal, and seeing that the alias is in effect. The other shells have similar startup files, such as ~/.bashrc
, though some have more than one possible startup file and preferential execution of one versus the other, depending on whether the shell is the login shell or was launched from another shell. If you want to keep using tcsh
, then do not worry about login versus non-login shells – regardless of how your shell is launched, it will always execute ~/.tcshrc
file when it starts.
Making Programs Interact¶
Standard Input/Output and Pipes¶
One thing you may be wondering now is if the Unix philosophy is to make simple, robust programs that do one thing and do it well, how do we use multiple programs to do something complex? This requires that we introduce the concept of “pipes,” “standard input,” and “standard output.”
As you may have noticed, all of the commands you have typed typically print out some result to the screen. This could be a list of files using ls
or find
, the contents of a text file using cat
, or a list of lines matching a pattern using grep
. This output is referred to as “standard output” and every Unix command that produces output prints it to the screen in roughly the same way.
Programs can also receive input via what is called “standard input,” which can come from the keyboard, from a file, or from the standard output of another program. This interaction where the standard output of one program becomes the standard input of another is referred to as “piping” and is the principal way that we can do more complicated Unix tasks using the basic commands.
As a very simple example, let’s say that we have a directory containing a large number of files and we want to be able to look at the long format list (ls -l
) of the contents. However, we want to read this in the terminal with the less
program, rather than having to manually scroll up and down. This can be accomplished using a “pipe:” enter ls -l | less
into the terminal, and you should have the list contents open using less
. The verical line |
is what is known as a “pipe,” and its meaning here is to tell the terminal to use the standard output of the command ls -l
as the standard input to the less
command. We can then read the output from ls -l
in less
like it is any old text file.
While this may seem really simple (and you would be correct) we can do more complicated things. We can search within the output of some program using grep via a pipe – to find out which files in a directory some user can read, write, and execute, enter ls -l | grep rwx
and the terminal will print the lines that contain the string “rwx.” Using grep
in a pipe to find patterns in the output of some other command is a very common use of pipes, and you are likely to come across many examples where this is useful.
Input and Output Redirection¶
There are other ways to deal with command output, as you can save results of terminal operations as files. Saving output to files (as well as using files as standard input) is done with the operators <
, >
, >>
, and the command tee
. <
signifies that the file following the <
operator is to be used as input to the command. Many of the commands that we have already seen use files as standard input without needing to use <
(for example, try entering cat testfile.txt
and cat < testfile.txt
; the output should be identical), but you will likely encounter cases where you need to specify the use of a file as input.
The operators >
, >>
, and the command tee
can all be used to save command output to a file. >
saves the command line output in a new file, overwriting the old file in the event that file exists. Try entering ls > temp.txt
into the terminal; the result should create a file “temp.txt” containing the result of the ls
command. >>
will append the results of a command to the file. Check the contents of “temp.txt,” and then enter ls -l >> temp.txt
. Use less
to view “temp.txt” to verify that it contains the results of both the short and the long list command.
Sometimes if you are piping the result of one command to another, you have some intermediate output that you might want to save for another purpose. To save the standard output from a command while simultaneously sending the same output to standard output, the tee
command can be used. Basically, tee <filename>
saves its standard input to file, but also sends its standard input to standard output. Try using tee
by entering ls -l | tee temp.txt
. The ls -l
results should both print to the terminal and save to the file “temp.txt.”
xargs
¶
One final trick for making programs interact is the following: what if you have some program that takes a number of individual file names as input, rather than standard input (i.e. a text stream)? For example, in the piping example above, we piped the output of ls
into grep
to search the output for a certain pattern. However, what if we didn’t want to search the output for that pattern, but rather the contents of those particular files for a pattern? We cannot do that using a pipe. Similarly, if we want to find a bunch of files, and then copy them to some other directory, we cannot use standard output in the cp
command, since cp
takes a file and copies it to another directory.
You can perform the above operations by making commands interact with the xargs
command. xargs
takes standard input and reformats it as a list of arguments, then repeatedly executes the command that follows using each of the elements of the list as an argument. By default, xargs
puts the argument at the end of the command, so if you have a command where the inputs need to be placed at a different point in the command, you should specify where xargs
should put the arguments using the -J
option and some other character as a placeholder (see an example of how to do this below).
As an example, let’s say I want to search every shell script (files with a suffix ”.sh”) in my home directory for the pattern “gmt” to figure out which of my shell scripts invoke GMT commands. This is one of those cases where I need to use xargs
to turn standard output into a list of arguments. I can do this with find ~ -name \*.sh | xargs grep gmt
as a terminal entry. The first command is find
to find all shell scripts (files ending in ”.sh”) in my home directory. The find
command by itself sends a list of files to standard output. Since grep
needs a list of arguments, rather than standard output, xargs
is used to transform standard output into a list of arguments. What is actually happening is that xargs
tells the terminal to execute the command grep gmt <file>
repeatedly, where <file>
is replaced on each successive execution with a line from the output from the find
command. Note that in this case, because xargs
puts the argument at the end of the command by default, and grep
takes its target file as its last argument, we could use xargs
without needing to specify any options.
As another example, here is a situation where we cannot use xargs
without any options. Let’s say that we want to copy every file that we modified in the past 24 hours to a single folder as a backup copy. Since cp
takes a filename as an argument, rather than standard input, we need to use xargs
to achieve this. Also note that the syntax for cp
is cp <file> <target>
, so the default usage of xargs
will not work because we need to insert our filenames in the middle of the command. To successfully perform this operation, we use the -J
option for xargs
, where we designate a special character to serve as a placeholder, and then xargs
replaces the placeholder with the argument derived from the input in each execution of the command. Assuming that the directory ~/backup
already exists, then we can copy all recently modified files to this directory by entering find ~ -type f -mtime -1 | xargs -J % cp % ~/backup
into the terminal. The first part of the command is the find
operation, which sends the files satisfying the critera to standard output. This is then piped into xargs
, which takes the list of files and inserts each part of the list into the appropriate place in the cp
command (using %
as the placeholder, which is specified by the -J
option). Note that other versions of Unix may have a different way of doing this command substitution, so check the manual page if you are on a system other than the Mac OS.
Do not worry if you do not totally understand how to use pipes, redirections of output, and xargs
, and everything else here. Unix takes time to learn how to make it do what you want. Over the semester, as we work with Unix more and start using Shell Scripts, AWK, and GMT, you will gradually become more comfortable with this. However, nothing can take the place of practice: if you want to truly become comfortable with Unix, then you should use these tools on a daily basis in your research.
Summary¶
Here are a list of the commands that we introduced in this lab. They are among the most common, and you will likely find yourself using them very often:
- Wildcards
*
,\
,?
,^
and[ ]
find
grep
chmod
alias
unalias
tcsh
and many other shell types>
(redirect to file)<
(redirect from file)|
(pipe to next command)tee
(send to file and standard output)xargs