Shell Scripting 1

This is the first of three labs on Shell Scripting. Shell scripting lets us write scripts that carry out tasks within the Unix terminal. Much like MATLAB .m files and SAC macros, we can do things like define variables and control flow using if statements and loops, which gives us more powerful tools to have the computer manipulate files and data. Without shell scripts, if you needed to process 1000 data files with a single program, you would need to type in the same command 1000 times, which would take you a long time and is likely to result in a mistake. A shell script lets you do the same thing with a loop, eliminating typing errors and greatly speeding up the process.

In this document, I will introduce a number of commands, and give simple examples of many of them. You should type each of the simple examples into a shell script file on your computer, run them to ensure that everything is working, and then tweak some things to ensure that you know how the different commands work, particularly for shell variables and quoting strings. This is your chance to experiment with how the commands are used. In the next two labs we will write more complicated scripts that use the commands from today.

Shell Scripts

To create a shell script, use your favorite text editor to open a new file called example.csh (feel free to change the name if you like). You can use any shell that you like, and it is common to use the shell that the script is written for as the file extension. Many people tend to write their shell scripts in either the C Shell or the Bourne Shell, because many of the interactive bells and whistles that have been added in the other shells are not commonly used when shell scripting (the tcsh and bash shells are supersets of the csh and sh commands, respectively). Thus, they use the least common denominator to make things as compatible as possible with older systems in case they need to run their script on another computer. However, I will note that on the Mac, running csh just runs tcsh, and running sh just runs bash, so in many cases the two are identical. I will try and point out some of the differences between the different shells as we go along, but the text that I type will in general be for the C shell. It is a good idea to be familiar with the syntax for both shells in case you are on a computer with only one of the shells, or you get a shell script from someone else.

Once you have your file open in a text editor, type the following in as the first line:

#!/bin/csh

This is usually the first line of a shell script, and if it is present it must be the first line of the file and have no leading spaces. This line tells the computer when it is executing your shell script that it should interpret all of the commands using the C shell. It is also telling the person reading the code that the script is intended to be interpreted by the C shell, so that they know what syntax to use when modifying. Remember that there are slight differences between the different shells, so it is important to use the syntax that corresponds with this shell. If you want your shell script to execute without running the default startup file, then the first line should be #!/bin/csh -f.

This line to specify the command meant to interpret the commands is generally present in all other types of scripting languages, such as Perl and Python. When a script is executed in the terminal, the shell reads this first line, and then calls the program corresponding to that line with the remaining lines in the script as input. This line is not strictly necessary, but if we do not include it we must specify the program when executing the script. For our shell script, we would need to type csh example.csh into the terminal without this first line.

After this line, you will enter the commands that you wish the shell to carry out when you run this script. Any of the commands we have learned so far can be put in a shell script, so to start with put some valid Unix commands into this file (things like ls and cat that print things out to the screen are good things to put in here while you learn how shell scripts work). You can include comments using # to designate a comment.

Executing a Shell Script

Once you have entered some commands, save your file and return to the terminal. There are several ways to execute your script. First, as mentioned above you can type csh example.csh, which invokes csh with your file as its input. This works whether or not you include the shell command as the first line of the file. However, if you do not want to type csh before every script that we run, we can avoid this. First, you need to give yourself execute permissions on your file (if you do not have execute permissions, use chmod to do so). Then you can enter ./example.csh into the terminal, which should execute your script. (Note: if you give the file the .csh extension, the shell can usually figure out what to use to execute your shell script if no command is specified in the script or the command line. However, it is generally good programming practice to specify the command in the first line of any script.)

Why ./example.csh?

Why do we need to enter the period prior to giving the name of the script? If you try to execute your shell script without it, it will not work (you will see a “Command not found” error message). This has to do with how Unix systems look for commands.

Whenever you type a command into Unix, it looks for a command that matches the command that you entered in certain places and in a certain order. It decides the places and the order using what is known as an environment variable called PATH. An environment variable is a pre-defined variable that provides information on the shell that you are working with. Example environment variables include information like your login name (USER) and the location your home directory (HOME). These are very useful for making shell scripts that work for different users on different systems, in case the specific configuration of a computer is different from the one where you wrote the script.

One very important environment variable is PATH. PATH is a string of directories where your shells looks for commands that match the one that you typed into the terminal. When you type in a command, the shell starts looking through the directories in PATH until it finds a command that matches the one you entered. It begins in the first directory, and if it does not find a command matching the one you typed, it goes to the next one, continuing all the way through the list of directories. If it gets to the end of all the directories in PATH without finding one, it gives an error message “Command not found.” Note that this applies to any command entered: standard shell commands like ls and cd are programs too that must be located before they are executed, and they are located in directories in your path.

You can look at your path using the env command, which will list all of the environment variables (there are lots). We are interested in PATH, which may require that you scroll up a bit. You can also access PATH by entering echo $PATH into the terminal. echo is a useful command in shell scripting, as it can be used to print things to the screen (including variables, which can help you debug your script). In either case, you will see a long list of directories, each separated by a colon (for example, the start of PATH might be /usr/local/bin:/usr/bin:/bin:...). These are the directories where the shell searches for commands, and the order in which they are searched.

If you ever want to find out if a command is located in one of the directories on your path, use the which command. which prints out the full path to a command that is entered, and is a convenient way to figure out (1) if a command is in your path, and (2) if a command is located in more than one directory on your path, the one that will be executed.

Note that PATH does not contain your current directory, hence you got an error when it looked for this command. To tell the shell to treat example.csh as an executable, we use the “dot” notation ./example.csh to tell the shell that the executable we want to execute is in the current directory. This is generally true for other types of executables and scripts, such as the AWK scripts we have looked at last week.

There is one way to avoid typing ./example.csh every time: we can add . to our path (in Unix . refers to the current directory), so that the shell will know to look in the current directory. This is generally frowned upon by system administrators, as someone could put malicious programs in one of our directories that gets executed because of this (making it a security risk). I do not put “dot” in my path for this reason, and I don’t really mind typing two extra characters. You can add additional directories to your path by typing set PATH = ${PATH}:<directory> (you will learn what this is doing shortly; basically you are appending a colon and a directory name to the existing string stored in PATH). This will only take effect in the present terminal; to make the effect take place in future sessions you will need to add this line to your startup file.

Variables

As we have seen countless times, variables are a powerful tool for programming as they can save typing and make programs more flexible and robust. Variables in the shell must start with a letter, and they are defined using set <name> = <value>, for instance set x = 1 sets x to be the string 1. To access the value of the variable at another time, use $x. Thus, we might write a shell script to copy a file to a new directory as follows:

#!/bin/csh

# shell script to copy a file to a new location

set file = myfile.txt

echo "Copying $file to home directory"
cp $file $HOME

This script is rather simple, but illustrates how variables are used in the shell. First, note that by using a variable, we only have to change the initial place where file was set in our script (I hope the usefulness of this is clear from our previous programming experience). Then, both the echo statement and the copy command use the correct filename. Note that I also used the environment variable HOME to designate the target directory for the copy command. By using HOME instead of the path to my home directory, this means that this script will work on any Unix machine, regardless of how the home directories are set up.

set is used to set the value of a variable in the current instance of the shell. However, if your shell script calls some other shell script, you will not have access to that variable within that shell script. To make a variable visible to other instances of the shell, you need to add it to the list of environment variables, as environment variables are passed on when a shell creates another shell. To set an environment variable, type setenv <name> <variable>. Try this, and then type env to verify that the variable has been added to the list of environment variables.

If you are using bash, the syntax for defining variables is different. Regular variables are just set normally in the shell (x=1), while environment variables are set using export: export x=1. Note that you cannot have spaces for setting variables in bash like you can using the C shell.

Quoting Rules in Shell Scripts

Note that in the previous example, I was able to print out the value of file in my print statement. This is because I used double quotes to designate the string to be printed out. What if I wanted to literally print out “$file” in that echo statement? Or what if you want to set a variable equal to some special character? We have already seen that one way to include spaces and other special characters in commands is to use the backslash \ immediately before the character. You can also put quotes, either single or double, around the entire string to have the shell interpret the entire string as a single argument in the event that your variable name include spaces.

The shell has complex rules for using quotes to denote commands and variable names. In particular, The shell treats single and double quotes in a different manner when interpreting the contents with respect to variable names. If you include a variable value $x inside double quotes, then the shell will replace the variable with its contents within that string. This was illustrated above. A further example:

#!/bin/csh
set x = 1
echo "number$x"

will print number1 to the terminal. If you would like to include further text immediately after the variable, use braces to enclose the variable name, like ${x}:

#!/bin/csh
set x = 1
echo "number${x}2"

to print number12 to the terminal. However, if you want the literal characters ${x} to appear for the echo command, then we use single quotes:

#!/bin/csh
set x = 1
echo 'number${x}2'

which will print number${x}2 to the terminal. This quoting works the same for setting variables (to set a variable to the string $x, put it in single quotes).

Including quotes within a string is a bit trickier. One way to do this is to use the backslash to precede a quote to put it into the string. You can also put single quotes within a double quote string, and vice versa. However, neither of these work if you need to put quotes around a variable name to get the shell to substitute that variable. In this particular case, the way to put quotes inside of a quoted string is to concatenate several strings that use quotes appropriately: for example, echo "'"'$x'"'" concatenates three strings together: the first is ' (the result of "'", or using a single quote within a double quote string), the second is $x, which uses single quotes to avoid expanding the shell variable within, and the final string is again ' (the result of "'"). The result is that the shell will print '$x' to the terminal.

One other quoting trick: if we want to include the output of a Unix command in a string, surround the command with backwards quotes (`). For instance, echo "The current directory is `pwd`" will print “The current directory is ” followed by whatever the output of the pwd command is.

Practice using quotes to get the following to print out in the terminal using echo. Precede the command with the commands set x = 1 and set y = 2 so that you will have some variables to work with. I will refer to the values stored in the variables x and y in italics, while I will refer to the actual letters x and y in regular script. Other text in italics should print what that text refers to on the screen.

  • The final score is y to x
  • Shell scripts print variables using $x and $y
  • x can’t be less than y
  • Don’t forget quotes when printing “$x” and “$y”
  • “I won’t make mistakes when using x and $y”
  • The current path is the current path
  • My home directory is your home directory
  • The file ‘example.csh’ contains the echo command number of times echo appears in example.csh times

Using quotes properly in a shell script can be tricky, but it is very useful as it lets you control string output. Since much of automating tasks involves reading and saving things from files, being able to specify file names from variables is essential (otherwise you would need to rewrite your shell script every time you wanted to change the target files). Thus, using these quoting rules to correctly construct strings with appropriate file names is a necessary skill for shell scripting.

Input Arguments

Other uses of variables in shell scripts allow you to make shell scripts into functions. This lets you use one shell script to perform a task that might have different input values. When you call a shell script, you can give it any number of input arguments as follows:

$ ./example.csh <var1> <var2> ...

Within the shell script, you can access these variables using $1, $2, etc. As a simple example, the following shell script takes three options, with the first option specifying a search string, and the second two options specifying files in which to search for that string using grep

#!/bin/csh
grep $1 $2 $3

These variables behave exactly like regular variable set with the set command. However, note that these have a different meaning than in AWK, so you need to be particularly careful if you need to write an AWK program within a shell script (you need to use single quotes around anything that you plan to pass to AWK).

Math with Shell Variables

In the C shell (not in bash – the syntax for doing arithmetic in bash is totally different, and won’t be covered here) you can do math with variables if you begin the line with @. For example, the following script should print out 1, then 2:

#!/bin/csh
set x = 1
echo $x
@ x = $x + 1
echo $x

Note that you must have a space between $x and the addition sign, and the one and the addition sign as otherwise the shell thinks you want to access the variable $x+1. You can also increment a variable by one using @ x++ and decrement a variable by one using @ x--. In these cases, no spaces are necessary.

There are also comparison operators for variables, which do string comparisons. == tests for equality between two strings, != tests if two strings are different, ~ compares a string to a string pattern on the right (i.e. can contain wildcards), and !~ tests if a string does not match a string pattern. Also it is important to note that all arithmetic is integer arithmetic, meaning that you cannot do math with a decimal value (you will get an error), and division always returns an integer value (i.e. 3/2=1 in a shell script). Valid arithmetic operators include +, -, *, /, ++, --, and % (modulo or remainder). We will see how to do non-integer arithmetic a bit later (it requires more complicated commands than integer math).

Array Variables

In addition to scalar variables, you can define arrays in a shell script. As with scalar variables, all array variables are stored as strings, but you can do integer arithmetic with them. Arrays are defined and you can access the elements as follows:

#!/bin/csh
set x = (1 2 3)
echo $x
echo $x[1]
@ x[1] = $x[1] + 1
echo $x

Indices in the shell begin at one, just like the indices for arrays in MATLAB. However, unlike MATLAB you cannot do array math (you need to iterate over all of the indices of an array to change everything) or floating point arithmetic, only integer arithmetic.

Other special ways that you can access array variables: $x[*] returns a list of all elements of the array $x, while if you give a range of numbers $x[2-4] it will return a slice of the array. $#x returns the number of elements in the list. You can also use an array to define another array, for instance set y = ($x[1-5]) will make y equal to the array containing the first five elements of x. The parentheses are important here, as it tells the shell that you want y to be an array. Other useful tricks regarding variables and arrays in the C shell include $?x, which tells you if a variable is defined (it returns 1 if it is, zero if it is not).

One other useful application of arrays in the C shell is that using path, rather than PATH, gives you the path as an array of strings instead of as one long string with colons separating the entries. This can be much more useful within a shell script when compared to the standard version with the entries separated by colons.

Floating Point Math

I rarely need to do floating point math in the shell, but if I ever do, it can be done with quoting. It is a bit awkward, but possible, so if you ever need to do some basic math in a shell script, you can do so. It uses the Unix Basic Calculator command bc. bc can be run interactively, or run commands supplied in a file or from standard input. The standard input version is most useful in shell scripts. For instance, to add two decimal numbers and store as a variable,

#!/bin/csh
set a = 1.4
set b = 2.2
set c = `echo "$a + $b" | bc`
echo $c

As you can see, it is a bit awkward as you need to use backwards quotes to get the standard output of bc into the command to set the value of the variable c. bc is useful for a number of other things, such as making less than and greater than type comparisons, which we will see in the next class when we cover conditionals and loops in shell scripts.

Summary

We have introduced a number of commands and concepts useful for writing shell scripts in the C shell (and Tenex C shell by extension). While these types of commands are generally used in writing shell scripts, they are equally valid in interactive terminal sessions. This means you can define variables in any terminal sessions, and you can also do so in startup files to have variables available for your use when using the shell interactively.

  • echo
  • env and environment variables
  • set
  • setenv
  • PATH and path
  • Variables, input arguments, and arrays
  • @ (for variable arithmetic)
  • Quoting (double, single, and backwards versions)
  • bc