Basic Programming with Python 3

This lab focuses mainly on one crucial skill for programming that you will use over and over as you write code for scientific purposes: debugging. No one writes perfect code the first time through, and you will want to develop the ability to test and find bugs in your code. Debugging requires more than anything else that you be persistent and systematic in looking at your code each step of the way. I will highlight some useful techniques for debugging in the following, but you should know that mostly you need to think carefully about what your code is doing. We will also talk a bit more about importing other modules, which gives you flexibility to reuse your own code, as well as external code packages written by others. I also briefly discuss simple ways to save data in a text file and how to read text data into Python from an external file. You will need to perform some basic text input for your homework/final project.

Types of Bugs

What types of bugs might you encounter when writing a program? There are three main types:

  • Crash: The most basic type of bug is when your program crashes and does not run to completion. You have probably encountered this problem already when working with the Python interpreter: you get some sort of error message that describes the problem, and the line where the problem occurred.

    This is the easiest type of bug to fix, because (1) you know that there is a bug based on the fact that your program crashed, and (2) Python gave you some information about where it occurred in the program. Fixing crashes may still pose some difficulties, but compared to the other two types of bugs, this is in general much easier to fix.

  • Program Never Finishes: Next up is a bug that leads to a program that never runs to completion. This is a bit harder to figure out, because it may just be that your (correct) program takes a long time to run, you just were not willing to wait long enough. This highlights an important issue in programming: it is always a good idea to have some idea of how long your program should take. Good ways to check this include (1) run your code on a very small subset of your data, and then progressively increase the size to see how long it might take to run on the full data, and (2) include a print statement every so often within your program to indicate progress through the code. I use (2) in almost every piece of code that I write: if I am solving a complicated differential equation with multiple time steps, I print out time step information, or if I am running something that does calculations for a large number of data files, I print out the number that I am on every so often.

    If you do in fact have a code that never stops running, you may have an infinite loop in your code, or some other exit condition that is never met. These are easier to fix than the final type of bug, as you usually can tell rather easily if you are in a situation when your code will never finish, and thus take steps to fix the problem.

  • Wrong Answer: The trickiest type of bug to fix, and the one you will no doubt spend most of your time worrying about how to correct, is when your code runs to completion but gives the wrong answer. These bugs can be notoriously difficult to find, as in many cases you might not know the right answer to the problem and thus cannot determine that there is a bug. Or you may know that there is a problem, but finding the culprit within a long and complicated code can be exceedingly difficult. In either case, this is more often than not the type of bug that will give you trouble.

The techniques I mention below are useful for finding all of these types of bugs, though you will find that for the most part they are useful for the second and third types (crashes are relatively easy to diagnose, particularly with an interpreted language like Python).

Test Functions

A useful method for working out bugs that is related to print statements, but is slightly more sophisticated, is the use of test functions. The idea is that if designed correctly, each separate task in a program should be written as a function. We can exploit this modularity and make our debugging and testing more robust by examining each function individually with a test function. A test function is a separate piece of code that calls a function multiple times with particular values (often chosen to test certain cases that may give a code trouble), comparing the results to the expected values and printing out the results. Here is an example test function for appendList above:

def test_appendList():
    'tests appendList for several different cases, printing out results'

    list1_vals = [[], [1], [1, 2]] # list1 values
    list2_vals = [[1], [], [3, 4]] # list2 values
    combined_vals = [[1], [1], [1, 2, 3, 4]] # expected results

    # loop over test cases, printing out test results

    for list1, list2, combined_expected in zip(list1_vals, list2_vals, combined_vals):

        # copy lists before calling to check for modification

        list1_copy = list(list1)
        list2_copy = list(list2)

        combined_actual = appendList(list1, list2)

        if (combined_actual == combined_expected and list1 == list1_copy and list2 == list2_copy):
            result = 'PASSED'
        else:
            result = 'FAILED'

        print 'Test:'
        print 'list1: expected = ', list1_copy, ', actual = ', list1
        print 'list2: expected = ', list2_copy, ', actual = ', list2
        print 'combined: expected = ', combined_expected, ', actual = ', combined_actual
        print 'Result: ', result
        print ''

This code runs three different test cases through the function appendList, and prints out the results. I introduced a new piece of syntax in writing this, zip, which is a convenient function for iterating over multiple lists at once. Basically, zip combines together a series of lists to let you use the convenient syntax of for <item> in <list>, but doing so over multiple lists at the same time without having to worry about indices. You can also do this using indices, but I find this approach to produce code that is easier to understand.

There are a few things to notice about this test function:

  • First, note that I set up the problem using three lists and a for loop to iterate over the test cases and the expected results, rather than writing out the same code several times. I did this for two reasons: first, I want to make sure that each test is run in exactly the same manner, and I also want to maintain flexibility about the number of tests that can be run. To add an additional test, all I have to do is add another set of items to list1_vals, list2_vals, and combined_vals, and the code automatically does the additional tests. If it is difficult to run additional tests, you are not likely to do them!
  • Second, note that I was careful about checking whether or not the original lists were modified by comparing the input lists to separate copies. As we saw above, it is possible to get the correct output, but incorrectly modify the input, and thus we should test the inputs as well as the output. If a function is not supposed to modify the input, then it is a good idea to build this into the test function.
  • Finally, this test function could have be written independently of a working version of appendList. This means that I could have even written the test function before I wrote the actual function – once I have defined what the function appendList does, I can figure out how to test it without even writing it. You don’t actually need to do this, but it is a good idea to think about how to test a piece of code during the entire process. The sooner you start doing rigorous testing of a piece of code, the better.

Writing test functions where you compare results with expected outputs for each piece of code is in general a good idea. Start small with the lowest level functions, and then work up to the highest level. This way, if you ever need to change a piece of the code, you can run all of the appropriate tests again to verify that you have not broken anything. In some of your homework questions, I will run a suite of test problems analogous to the test shown above to verify that your code does everything correctly, and doing so yourself is also a good idea.

Assertions

One final piece of advice for writing bug-free code in the first place is to use assertions in your code. Assertions are statements that cause your program to stop running if the conditions that they test are not met, and can be a good way to help catch bugs. Typically, when I write a function, the first few lines are assertions that check a few conditions that might trip up my code if the inputs are not correct. Therefore, I know right away that something is wrong, and where it went wrong, rather than having to deduce it from the final result and then work my way through the code systematically to find the problem. While this cannot help me debug appendList if it is not working correctly, if I was writing appendList as part of a larger project these could help me figure out if there is a bug in any code that calls appendList.

For instance, let’s say that I am writing a piece of code where I need to combine two lists into a third list without modifying either list. This can be easily done with our appendList function. However, if one of the lists contains just a single number, and I somehow think that appendList works like the append method for a list, I might try something like appendList(someList, a), where a is a float.

As it stands now, if I try to run the code, I get an error message regarding that 'float' object is not iterable. While I might be able to figure out that this is the problem from the error message, I can use assertions to test this and make the failure more explicit. Basically, before doing anything inside the function, I would like to be certain that both of my input arguments are lists. If they are not, then instead of trying to do a set of operations that I know will fail, I can raise a very specific error message that explains the problem in plain English. Here is the code with assertions:

def appendList(list1, list2):
    'appends list2 to list1, returning the combined list. does not modify either input list'

    assert type(list1) is list, "input argument list1 must be a list"
    assert type(list2) is list, "input argument list2 must be a list"

    outlist = list(list1)   # outlist = list1[:] is also acceptable

    for item in list2:
        outlist.append(item)

    return outlist

With the added assertions, now if I try appendList(someList, a), I will get an AssertionError, along with the message input argument list2 must be a list (the optional string in an assert statement will be included in the error message). This is much more helpful in figuring out how to correctly call appendList than the previous error message. If I call the code correctly, both assert statements silently pass, and do not affect the execution of the code.

To check the type of a variable, I used the protected keyword is along with the type function. type(<object>) is <type> gives back a boolean value, and can be used to check the type of a variable. This is often something that you would want to check. If something is a counting index, assert that it is a nonnegative integer. If something needs to have a numeric value, then assert it is a float or integer. If two lists must have the same length, add an assertion to check that the lists have the same length. I try to check anything that may make my program go haywire or execute incorrectly with an assertion, as they typically give more useful information than other error messages.

Assertions can also be used as a handy trick to crash a program intentionally. If I am trying to debug something, and in doing so change something so that I know that the remainder of the program will not run correctly (or take a long time to run), I often short-circuit the program by adding assert False, which always stops the program dead in its tracks. Otherwise, to prevent the remainder of the code from running, I would have to comment it all out (which may take a long time to do and then undo when the problem is fixed). I can save myself time and typing in such a case by adding a simple assertion to the code.

Practice

Here are some problems to work on in lab today. Use print statements to help debug them as you work. You should use assertions where appropriate, and also write a small test function for each problem.

  • Write a function isPalindrome to test if a list is a palindrome (i.e. the list is the same backwards as it is forwards: [1, 2, 1] is a palindrome, [1, 2, 2, 1] is a palindrome, but [1, 2, 3, 1] is not).
  • One method for calculating the square root of a number \({s}\) is as follows: (1) take a guess \({x_n}\), (2) check if \({x^2_n=s}\) within some allowed amount of tolerance (if so, we have a good enough answer and we can exit), (3) if the guess isn’t good enough, make a new guess \({x_{n+1}=(x_n+s/x_n)/2}\) and return to step 1. Write a function calcsqrt to implement this method, taking \({s}\) as an input and returning the square root.
  • Add appropriate assertions and write a test function for some of the functions from Lab 5.

Importing External Code

By now, you have a basic understanding of how to program in Python (or any other language, for that matter – if you can program in Python, it is easy to learn to program a different language). Many of the things I have highlighted are aimed and making better written, more usable computer code, in particular how to use functions to accomplish this. But what if you are writing some code, and would like to use a function that your wrote at some previous time? From what we have seen so far, you can only do this by copying and pasting the function into your Python script. If we only could use a single file for all of our coding, this would get out of hand very quickly, and if you needed to make a change to the function, you would need to find it in every file and change it.

The solution to this problem is to use Python’s import capabilities. Basically, the idea is that we can load functions and other code from a separate file using import, which are loaded into what is known as a module. Let’s say we have a function add that we have defined in the file “addition.py.” If we would like to use add in a separate piece of code, we can do the following:

import addition
print addition.add(1., 2.)

This code will import all functions in the “addition.py” file, and allow you to use them with the name addition.<function>. The use of addition is what is known as a namespace, which is just a fancy way of saying that Python groups functions together based the place from which they were imported. Namespaces help you keep track of where different functions come from. You can choose the namespace where external functions are imported with the following syntax:

import addition as add
print add.add(1., 2.)

If you were going to use the add function many times, choosing a shorter namespace can help cut down on typing.

If you have many functions in addition, but are only going to use add, you can also selectively import functions from a module with the following syntax:

from addition import add
print add(1., 2.)

This puts add in the current environment, so it if you want to minimize the amount of typing, this is the way to go. Finally, you can import all functions from a given module into the present environment using:

from addition import *
print add(1., 2.)

This is okay if there are only a couple of functions in the addition module, but if there were many of them, it can be confusing to not group them all together in a separate namespace and this is not considered good programming practice. This is particularly true if you import more than one module using *, and it is not made explicit which functions belong to which modules.

Where does Python look for modules? First, it looks in the current working directory from which python was called. If it finds a module in the present directory, it uses that module. If it does not find one in the current directory, it searches all the paths specified on what is called the “Python path,” and the last place it looks is the standard installation directory for python packages. Do not worry about the last two places for right now; these are only needed if you want to create packages that can be imported from a different directory than the one you are working in (it is fairly easy to modify the Python path to find packages in other directories; I can show you how to do this if you ask).

In addition to your own modules, there are many externally available packages that are useful in many contexts. We will not cover these in this course, but I frequently use many of them, in particular numpy which defines numerical arrays that are basically a more powerful version of a list and useful for scientific computation, scipy which gives access to many standard methods for scientific computing, and matplotlib which is a powerful plotting library. These free and open source modules replace most of the MATLAB functionality using Python, and these are becoming common tools in science. You are of course welcome to experiment with them on your own to see if they will be useful in your research. Many of these tools are installed in the Python installation in the Mac Lab, but they are also easy to download and install on your own computer.

Reading From and Writing To Files

Over our previous three classes, we have seen how we can use Python to carry out complicated tasks on data that we enter ourselves. However, as discussed in the first class, we usually need to perform calculations on data that we obtain from external sources. This means we need to get the data into Python, and need a way to save results of calculations. Python has a number of capabilities for reading and writing data. We will not cover all of them, but for your homework and term project, you will need to use Python to perform calculations based on data input from a text file, so here I cover the basics of text file input and output.

Reading From a File

Python has built-in capabilities for dealing with text files through what is known as a file object. A file object is a fancy data type that represents a file on your system, and includes functions that let your read and write from the file. To open a file object, use the following syntax:

f = open('myfile.txt', 'r')

The basic command is the open function. open takes two arguments: the name of the file ('myfile.txt' in this case), and the mode under which that file will be opened ('r' in this case, which indicates that the file should be opened in read-only mode). You can specify either an absolute or a relative path for the name of the file, but you must give a string for the file name. You have several options for the mode: 'r' indicates read-only mode, 'w' indicates write-only mode, and 'rw' indicates read-write mode. You do not need to provide the second argument as 'r' is the default, though I try to always put this into my codes as I want to be explicit about how a file will be opened so that I do not accidentally open a file that I do not want to modify in write mode.

Note that if you want to read from a file, the file must exist before you try to open it. If it does not exist, you will get an error message. You also need to be careful when opening a file in write mode, as it will overwrite a file if it already exists.

Once you are finished with the file, you should close it using the close function of the file:

f.close()

This will close the file. The file will automatically close when the program finishes if you did not explicitly close it, but it is a good idea to close files when you are done with them because they can slow down your program if too many are open.

To read from a file that has been opened in read mode, use the read function of that particular file:

f = open('myfile.txt', 'r')
a = f.read()
print a
f.close()

The entire contents of the text file will then be stored in the variable a as a string, and will be printed to the screen. Note that everything read from a file is a string, so if you need to get numbers out of a file you will need to convert to the appropriate type.

Reading the entire contents of a file into a single string is fine for short files, but if you have a large data file it might be a bit tedious to handle the data in this way. More likely, you will have a file where a certain number of items are written on each line. To read in a single line from a file, use the readline function. So if you had a file with three lines, you might do the following:

f = open('myfile.txt', 'r')

for i in range(3):
    a = f.readline()
    print a

f.close()

As before, this will print the file to the screen, but each line will be separated from the next one. However, what if we don’t know how many lines our file has? Fortunately, Python makes it easy by letting us loop over the file itself to implicitly read it one line at a time:

f = open('myfile.txt', 'r')

for line in f:
    print line

f.close()

Note that you can implicitly perform the readline function each time through the loop in this way. This is the way I recommend handling reading text from a multi-line file.

How do you deal with text once you read it in? A frequently used function for handling strings read from a file is split, which takes a string and divides it into words, giving you a list of strings. For example, if you have a text file with three numbers on each line separated by a space, you get those numbers as follows:

f = open('myfile.txt', 'r')

for line in f:
    stringlist = line.split()  # stringlist is a list of three strings
    numberlist = []
    for string in stringlist:
        numberlist.append(float(string))
    # numberlist is now a list of numbers
    # peform some analysis, etc.

f.close()

Writing To a File

Writing to a file is similar to reading from a file, you just need to open the file in write mode. Once you have the file open in write mode, you can write data to the file using the write command. As an example, imagine you have carried out a calculation on 100 different numbers, getting two different results. You have the results stored in a list results, with each list entry being a list of length two holding the two results. To write this to a text file, you might do the following:

f = open('calculation_results.txt', 'w')

for result in results:
    f.write(str(result[0])+' '+str(result[1])+'\n')

f.close()

Note that you can only write strings to a text file and that I had to explicitly convert them here before I could write to file. You might also notice that I had to put in my own dividing characters between the entries: there is a space between the two numbers, and a newline (i.e. a carriage return) character to separate each line. You can also separate the entries with a tab character using '\t'.

You can do more sophisticated things with python, including writing binary data and writing things in a Python-specific format that lets you save and reload complex objects. We will not worry about these things here, as for this class reading and writing simple text files is all that you will need to worry about. However, if you use Python in your research, you will no doubt want to take advantage of these capabilities.