Basic Programming with Python 2

Today we continue developing basic programming skills. We will focus on two very powerful concepts that you will see frequently in all programming languages: functions/procedures, and data types that are collections of primitive data types.

Functions and Procedures

Our programming so far has involved using loops and conditionals to manipulate data. While this works well, if we have a long and complicated program, the list of operations to perform can become rather long. This makes it hard for a human to read and comprehend all that is going on, particularly if there are many different variables that are floating around and being used. Instead, it would be better to break the program up into a series of sub-pieces, each of which is self-contained and can be dealt with separately. This increases comprehension of code, and is generally considered good programming practice.

Further, we may often have a set of operations that we commonly perform. It would be rather tedious to repeatedly write the same code (or to copy and paste it), and in doing so you are likely to make a mistake. Additionally, if you realize that you made a mistake, or find a better way to carry out a calculation, you would need to make changes in every occurrence of this set of code. A better practice would be to have a single set of code to carry out such operations. This saves you work, makes the code easier to read, and changing the code is simpler because you are certain that there is only one place where changes need to be made.

Programming languages make this possible by being able to define functions (sometimes referred to as procedures or subroutines if they are not truly functions). I will use function as a broad term covering all of these things here (as Python has a single syntax covering all of them), but some programming languages use a different syntax for functions versus procedures.

A Simple Example

We can define a function in Python using the def keyword. Here is a simple function that adds two numbers together:

def add(a, b):
    'adds two numbers a and b together, returning the sum'
    return a + b

There are several things going on here:

  • The first line tells Python that we are defining a function because of the def keyword. Following that, we have the name of the function (add in this case), and then the input arguments to the function, in this case a and b. The function name is subject to the same rules as variable names: it must start with a non-numeric character, and cannot contain protected characters or be any protected keywords.
  • The next line contains a string. This is called a docstring, and it a short piece of documentation that describes the function. It is a good idea to create docstrings for all of your functions. The docstring should describe what the function does, what its inputs are, and what its outputs are. Here, a short string is sufficient to mention that a and b are numbers, and that the function gives you back their sum. However, if you want to provide a detailed, multi-line string, you can do so if you enclose the entire string in triple quotes (''' ... ''').
  • Finally, we have the body of the function. Note that it is indented to indicate what set of commands belong in the body of the function. This function is just one line, which adds the two numbers and then uses the return statement to indicate that this is what you get back when you call the function.

To call the function, we type add(3, 4) into the interpreter or our script, which will either print 7 on the interpreter output, or we can embed that into code just like any other integer. So if we wanted to use add to sum many different integers, I can enter add( add(3, 4), add(6, 8) ) to get 21 back. Python calls add three different times here: once with 3 and 4, once with 6 and 8, and then a third time with 7 and 14 (the results from each previous call).

While this is a very simple example, if you needed to do a more complicated set of operations, this will allow you to define your function just in one place, and then concisely call it many different times. If you need to change the function, you only need to do so once. This is a very powerful concept: it allows for modularity in your code. You can check that small functions work independently from one another, making it easier to debug a large and complicated program. It also makes reading a program easier, as you can keep each piece of code shorter with several function calls replacing large sections of code. Functions are probably one of the most important concepts in software design, and you should always strive to think about breaking your problem into smaller pieces that can be written as functions.

Scope

What happens when you call a function? Python creates what is known as an environment that is separate from the rest of your program. All variables given to the function as arguments are available in this environment, as are any additional variables that you may define within the environment. You might have variables defined in your main program that have the same name as variables in your function, and the fact that Python creates a new environment means that any changes to variables that occur within a function cannot be seen by the main program, as they are local to the function. This is what programmers call scope: it tells you about where variables can be seen by other things. For example, the following program will not modify the value of a in the main program:

a = 1

def myfunc():
    a = 2

myfunc() # sets a = 2 in the other environment, but changes nothing in the main program
print a  # will still print 1

When a is printed, it will still have the original value of 1, rather than the local value of a during the execution of myfunc.

However, Python does let you use variables defined in other parts of the program in a function, with a few specific rules. If you access the value of a variable of a function, Python first checks the environment created for the function for that variable name. If it finds one, it uses that, but if it does not find one, it looks to the environment that called the function (called the parent environment) and checks for that variable. If it finds one in a higher level environment, it uses that value. If it does not find one, it keeps looking to the parent environment of the parent environment, until it finds a value or runs out of environments to check. However, it is generally better programming practice to just use local variables when writing a function, as the intent is not always clear when you use variables that are not local to a function, except for global constants that should only be defined once (for example, you do not want to have to redefine \({\pi}\) every time to write a function).

Default Arguments

Python lets you set default values for the arguments in a function. This is done as follows:

def exponent(x, y = 2.):
    'raises x to the power y, returning the result. default behavior is to square x'
    return x**y

We can call exponent either as exponent(4., 3.) to cube 4, or exponent(4.) to automatically use 2 for the exponent. Any number of arguments can have default values; the only restriction is that the required arguments (i.e. parameters with no defaults) must all be listed before any optional arguments (i.e. parameters with defaults) when defining the function. You can specify parameters in any order that you like when calling a function if you give the parameter name when calling the function: exponent(4.), exponent(4., 2.), exponent(x = 4.), exponent(x = 4., y = 2.), and exponent(y = 2., x = 4.) are all valid and equivalent in Python.

Functions are Objects

One convenient feature of Python is the fact that functions can be treated like any other variable. For example, you can set a variable equal to a function, which lets you use that other variable to call the function. For example:

def square(x):
    'returns the square of input number x'
    return x*x
f = square   # note that I do not use ( ) to call the function, this simply binds the name f to the function square
print f(4)   # prints 16

Function names are just like any other variable in Python, and so you can assign a function to any variable name that you like. You can also pass a function to a function just like any other variable: the function that is passed as a parameter will be assigned to the designated value within the new environment created for the function. Here is an example:

def square(x):
    'returns the square of input number x'
    return x*x

def doTwice(x, f):
    'applies a function f taking a single argument to input x twice'
    return f(f(x))

print doTwice(2, square)   # prints 16 = (2**2)**2

This is a very powerful tool, as you can create functions where the functions that are called do not need to be known when defining the function. Many other programming languages allow for this in addition to Python, so it is a handy trick to be aware of.

As you can see, functions are extremely powerful. Whenever you are writing code, you should get in the habit of thinking about how to break down what you want to do into smaller chunks that can be easily written as a function. Then, building up from the most granular level, write the base functions, then the functions that tie those functions together, and so on until you have your entire code written. I will try to suggest good ways to break up homework problems this way so you can get used to this mode of thinking.

Lists

In science, we frequently have to deal with data that is a collection of numbers, rather than just a single number. For instance, a seismogram contains values at many different times, as well as north/south, east/west, and vertical components. GPS data also has multiple components. Datasets like InSAR are collected over an extensive spatial area. It would not be very convenient to use computers to analyze these types of data if we needed to give a distinct variable name to every single number, as that obscures the underlying structure of how the data was collected. Rather, we would like to aggregate appropriate collections of values into single entity that we can deal with more concisely in our programs.

One tool for handling collections of data in Python is using the built-in list datatype. A list is just a collection of items: the items in a list can be anything: integers, floats, strings, booleans, or even other lists. For example, here is how you create a list of integers in Python:

mylist = [ 1, 2, 3, 4, 5 ]

One nice thing about lists in Python is that not all list items have to be of the same type. You can combine different types in the same list, though most of our uses of lists will involve situations where the items have the same type.

Indexing

We can access the entire list using the variable name mylist, or we can examine the individual items in the list using the indexing operator [ ]. To get the first item in this list, we would enter mylist[0] (try doing this in the interpreter; it should print out 1). Note that the convention in Python is to use 0 to indicate the first position – programming languages that work in this way are called “zero-indexed,” and the index in such a language indicates an offset from the beginning of the list (hence 0 means zero offset from the first position, 1 is one place away from the first position, etc.). Some other languages (most notably MATLAB, as we will see in a few classes) use 1 to indicate the first position, and such languages are called “one-indexed.”

Python also lets you enter indices in other ways. You can get a subset of a list by specifying a range of indices. For example, to get a sublist containing positions 1 through 3 from our list above, we can enter mylist[1:4]. Note that the length of the sublist is the difference between the two indices: even though we entered 4 as the second number, the final entry in our sublist is at position 3. If you want to skip over every other entry, enter mylist[1:4:2], which means “position 1 through 4, taking every 2 positions.” You can omit the starting and/or ending index if you want to start at the beginning of the list and/or end at the end of the list: mylist[::2] gives every other position in the entire list, mylist[1:] gives a sublist that starts at the second position through the end of the list, mylist[:4] gives the first 4 items in the list, and mylist[:] gives the whole list.

You can also use negative numbers for the indices. Since the index refers to offsets in Python, a negative number just means that the offset it taken in the backwards direction: mylist[-1] is the last item in the list, mylist[-2] is the second from the end, etc. You can also take negative indices to be the indices for a sublist: mylist[-3:-1] gives you the final 2 places, and mylist[::-1] reverses the order of the entire list. Negative indices are useful if you want the last element in a list, but do not necessarily know how many total items are in the list (though you can easily get the length of a list using len( )).

What if you need to represent 2-dimensional data? You can do so with a list of other lists. For example, one way to represent a 3x3 matrix is as a list of length 3, where each list entry is a list of length three. We can define such a matrix like this:

matrix = [ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]

To access the matrix elements, we now need to use two indices. Since matrix[0] refers to the first list element (which is itself a list) then matrix[0][0] refers to the first element of the first element. You can also use the same range tricks with multiple indices. Try some of the following to be sure you know how they work: matrix[1][-1], matrix[::2][::2], matrix[:][::-1], and feel free to try some other options on your own.

Getting used to indexing is tricky, so take some time now to practice. MATLAB has similar indexing tricks, thus understanding them will be handy in many different contexts in multiple programming languages. You might also hear people call this “slicing,” as it lets you cut a particular piece out of a list.

Indexing can also be applied to strings in order to get a substring. If a is a string, then a[0] returns a string only containing the first character, a[0:2] returns the first two characters, and a[-1] return the final character. All slicing operations described above work in the same fashion on strings as they do on lists.

List Operations

There are many things you can do with lists:

  • Addition: You can use the addition operator to concatenate two lists. If a and b are both lists, then a + b is a list containing all of the elements of a, followed by all of the elements of b.
  • Multiplication: Multiplying a list by an integer n returns a list that concatenates n identical copies of the list. This is a nice trick for creating a long list where every entry is the same: if I want a list containing 100 zeros, then I can use [0.]*100.
  • Comparisons: You can directly compare lists to use in conditional statements using any comparison operator (i.e. ==, >=, etc.). The result depends on the exact comparison being done, but in general there is an element-by-element comparison. If using the equality operator, you will get back True only if the lists are the same length and each element is the same. If using a comparison like >= where it is possible that you could get different answers at each place in the list, Python gives you back the result from comparing the first item in the two lists.
  • Checking if something is in a list: You can check if a certain item is in a list using the form <item> in <list>. So for example, to check if 5 is in mylist, I would enter 5 in mylist, which should give me back True. If you need to know the index where an item is, use the index command: mylist.index(5) will return the first index where 5 occurs.
  • Counting the number of occurrences in a list: To count the number of times that something occurs in a list, use count. So if we want to count the number of times a list entry is 1, I would use mylist.count(1).

Lists are Mutable

Lists can be changed on the fly in Python. You can add items, take them away, or shuffle them around. Some common ways to manipulate lists include:

  • Append: You can add a new item onto the end of the list using append. To add the integer 6 onto the end of the list above, we would type mylist.append(6). Try this, and then print out the entire list to verify that it has changed.
  • Insert: To add a new item to a list in a specific location, use insert. You need to specify the index when you call insert, so to put item a at the beginning of list mylist, use mylist.insert(0, a).
  • Pop: To remove an item from a specific location in a list, use pop. If you do not specify an index, pop will remove the final item in the list. So to remove the final item from a list, use mylist.pop(), while to remove the first item, use mylist.pop(0). Note that pop returns the item that is removed, so this is a convenient way to perform an operation on all items on a list one at a time.
  • Remove: To remove a specified item from a list, use remove, which will remove the first instance of the given item in the list. For example, mylist.remove(5) will remove the first occurrence of 5 in the list.

Iterating Over Lists

We used for along with the range function in the previous lab to repeat computations. You can also use for along with a list to iterate over all items in a list. The syntax is

for <item> in <list>:
    # repeats the body of the loop with <item> changing

So for example, if I have a list and want to print all of the entries individually,

for x in mylist:
    print x

This is a handy way to loop over a list without modifying it (like you would need to do with pop). In fact, this is exactly what Python is doing when you use a loop with range: range actually creates a list containing the appropriate integers specified by the arguments, and then for iterates over the list items.

One Annoying Detail

One detail you need to be aware of when using Python lists concerns how Python stores a list in memory. If I have an integer variable a = 5, and then I set b = a, then there is no ambiguity about what what I mean when I set b = a: there is only one integer with a value of 5, and Python sets the value of b to be 5 as well. This is also true for floating point numbers and strings, as there is a single, unique representation of the item in the computer memory.

Lists are different, because lists can be altered and changed on a whim, which means that there is a more complicated way to store them in the computer’s memory (since it is not a unique, unchangable number or string). When I define a = [1, 2, 3] and then set b = a, there is some ambiguity about what I am doing: did I mean to copy list a to a separate location in memory, or do I just want b to refer to whatever a happens to be, even if I change it?

The designers of Python have decided to use the second option, also known as aliasing. When you have a mutable object like a list and assign it to that variable, the variable points to a location in the computer’s memory, saying that the list is stored there. When you assign another variable to that same variable, Python simply tells the second variable to point to the same location as the first variable. This is more memory efficient, as you do not need to copy a potentially long and complex list over to a new location in memory. However this can have unintended consequences. Try entering the following into Python:

a = [1, 2, 3]
b = a
b.append(4)
print a # will print a list with 4 items

This can be annoying if you did not mean to alter a, and wanted to make a copy in b. To avoid this, we have two options:

  • For a simple list, we can make a copy of a in b using the notation b = a[:] or b = list(a). Either of these will simply return the enire list as a new list, which can then be assigned to a new variable. However, this may not work in all circumstances, as this is what is known as a “shallow copy:” if your list contains other lists, Python will still use aliases for those items.
  • If we want to create a full copy of another list, we need the deepcopy function, which will make a “deep copy” with full copies of all items in the list, including any lists contained in the list. This is not a normal part of Python, so we need to import it into Python. Type import copy.deepcopy to import the deepcopy function. Then you can make a full copy of your list using b = copy.deepcopy(a).

Do not worry about the details of import yet; we will talk about this in more detail in the next class.

This is just to make you aware of things that can go wrong when changing lists. This is something to worry about when you send a list to a function and then modify the list in the function. Python simply makes an alias of the original list when it is sent to the function, so any modifications to the list will affect what is in the list in the environment from which the function was called. Be aware of this, and make copies of a list if you do not intend for a function to modify a list!

Practice

Here are some problems to practice concepts related to functions and lists. In each case, it is a good idea if you come up with several tests to check that you are doing things correctly.

  • Write a function that calculates the average of three floating point numbers.
  • Write a function that calculates the average of an arbitrary number of floating point numbers. Pass the numbers to the function as a list.
  • Write a function that takes a list and reverses the order of the items in a list. The function should not modify the original list. Do this both using a for loop and indexing.
  • Write a function that takes a list of numbers and a function that takes a single number as an argument, returning another number. Have the function apply the function to every item in the list, and return a list containing the results. Your function should not modify the original list.
  • Write a function to test if a list of integers are prime, returning a list of boolean values. Hint: You should write a helper function isprime that tests if a number is prime, returning a boolean value (you can adapt your code for determining if a number is prime from the last lab if you like). Use your function above that applies a function to a list of values to help you do this.