.. _python3: ********************************* Basic Programming with Python 3 ********************************* This lab focuses mainly on one crucial skill for programming that you will use over and over as you write code for scientific purposes: **debugging**. No one writes perfect code the first time through, and you will want to develop the ability to test and find bugs in your code. Debugging requires more than anything else that you be *persistent* and *systematic* in looking at your code each step of the way. I will highlight some useful techniques for debugging in the following, but you should know that mostly you need to think carefully about what your code is doing. We will also talk a bit more about importing other modules, which gives you flexibility to reuse your own code, as well as external code packages written by others. I also briefly discuss simple ways to save data in a text file and how to read text data into Python from an external file. You will need to perform some basic text input for your homework/final project. ====================== Types of Bugs ====================== What types of bugs might you encounter when writing a program? There are three main types: * **Crash:** The most basic type of bug is when your program crashes and does not run to completion. You have probably encountered this problem already when working with the Python interpreter: you get some sort of error message that describes the problem, and the line where the problem occurred. This is the easiest type of bug to fix, because (1) you know that there is a bug based on the fact that your program crashed, and (2) Python gave you some information about where it occurred in the program. Fixing crashes may still pose some difficulties, but compared to the other two types of bugs, this is in general much easier to fix. * **Program Never Finishes:** Next up is a bug that leads to a program that never runs to completion. This is a bit harder to figure out, because it may just be that your (correct) program takes a *long* time to run, you just were not willing to wait long enough. This highlights an important issue in programming: it is *always* a good idea to have some idea of how long your program should take. Good ways to check this include (1) run your code on a very small subset of your data, and then progressively increase the size to see how long it might take to run on the full data, and (2) include a print statement every so often within your program to indicate progress through the code. I use (2) in almost every piece of code that I write: if I am solving a complicated differential equation with multiple time steps, I print out time step information, or if I am running something that does calculations for a large number of data files, I print out the number that I am on every so often. If you do in fact have a code that never stops running, you may have an infinite loop in your code, or some other exit condition that is never met. These are easier to fix than the final type of bug, as you usually can tell rather easily if you are in a situation when your code will never finish, and thus take steps to fix the problem. * **Wrong Answer:** The trickiest type of bug to fix, and the one you will no doubt spend most of your time worrying about how to correct, is when your code runs to completion but gives the wrong answer. These bugs can be notoriously difficult to find, as in many cases you might not know the right answer to the problem and thus cannot determine that there is a bug. Or you may know that there is a problem, but finding the culprit within a long and complicated code can be exceedingly difficult. In either case, this is more often than not the type of bug that will give you trouble. The techniques I mention below are useful for finding all of these types of bugs, though you will find that for the most part they are useful for the second and third types (crashes are relatively easy to diagnose, particularly with an interpreted language like Python). =================== Print Statements =================== A classic technique for debugging is using print statements. While more sophisticated tools are available, we will not use them in this lab, and instead focus on ways to use print statements in debugging. The idea is to print out values in your code where you know what the answer should be, and if they do not agree, take steps to correct the values. Most of the time when I am debugging a piece of code, this is the strategy I use, and it is one I will try to employ when you ask me for help on something. Let's look at a rather simple example. Let's say that we didn't know about the built-in method in Python for adding two lists, and instead decided to write our own function to do this. We will call this function ``appendList``, and it should take two lists as inputs, and return another list as output. Additionally, we would like for ``appendList`` to not change the inputs to the function (i.e. it should have no side effects). Here is a first attempt at writing this as a function: :: def appendList(list1, list2): 'appends list2 to list1, returning the combined list. does not modify either input list' outlist = list1 i = 0 while i < len(list2): outlist.append(list2[i]) return outlist If you try to call ``appendList``, you will notice that the program never finishes. You probably can see my mistake, but if you cannot, how might we systematically figure out the problem? One good way to check is by adding a print statement in the function to see what is going on when the function is called. For instance, I might try something like this: :: def appendList(list1, list2): 'appends list2 to list1, returning the combined list. does not modify either input list' outlist = list1 i = 0 print 'starting iteration over list2' while i < len(list2): print 'list index value:', i outlist.append(list2[i]) print 'current value of outlist:', outlist return outlist Doing this, we immediately see that we forgot to increment ``i`` each time through the loop, leading to the infinite loop. The function endlessly adds items onto the end of ``outlist``. The print statements helped us pinpoint this problem. While I can fix this easily with an increment, I might realize that I could have avoided this in the first place with a ``for`` loop, so I could rewrite like this, keeping some debugging print statements in place: :: def appendList(list1, list2): 'appends list2 to list1, returning the combined list. does not modify either input list' outlist = list1 print 'starting iteration over list2' for item in list2: print 'item to be appended:', item outlist.append(item) print 'current value of outlist:', outlist return outlist Note that when I inserted the print statements for debugging, I tried to include additional information about the meaning of the variables that are printed. This isn't totally necessary, but it can serve as a helpful as a reminder to you as the additional words make it explicit what the variables should be. Doing this, we find that our code runs to completion (try it yourself, using two short lists), and we can see that the items are correctly appended onto ``outlist``. Print out the resulting list in the interpreter to confirm that the function worked correctly. While it may seem that everything is okay, there is still one more thing to check: this function is *not* supposed to modify the input lists, but try printing out the first list that was passed to ``appendList``. You should see that the first input list has been modified, which was not supposed to happen. How could we go about debugging this? I might add more print statements that tell me the values of ``list1`` and ``list2`` as I go through the loop: :: def appendList(list1, list2): 'appends list2 to list1, returning the combined list. does not modify either input list' outlist = list1 print 'starting value of list1:', list1 print 'starting value of list2:', list2 print 'starting iteration over list2' for item in list2: print 'item to be appended:', item outlist.append(item) print 'current value of outlist:', outlist print 'current value of list1:', list1 print 'current value of list2:', list2 return outlist This should make it clear what is going on: because of how Python implements aliasing in lists, we appended the items onto ``list1`` when we append to ``outlist`` because the two variable names point to the same location in the computer's memory. The fix for this is to make a copy of ``list1`` in ``outlist``, rather than an alias. Here is the corrected code, with the print statements still in place: :: def appendList(list1, list2): 'appends list2 to list1, returning the combined list. does not modify either input list' outlist = list(list1) # outlist = list1[:] is also acceptable print 'starting value of list1:', list1 print 'starting value of list2:', list2 print 'starting iteration over list2' for item in list2: print 'item to be appended:', item outlist.append(item) print 'current value of outlist:', outlist print 'current value of list1:', list1 print 'current value of list2:', list2 return outlist You can verify that this version of the function does not modify the input, which is confirmed by the print statements. In general, using print statements to debug your code is a good strategy. As your programs get more complicated, though, it may not be obvious how exactly one should use print statements. Here is where being systematic and persistent pay off: a good strategy is to pick a point about halfway through the program, and insert a print statement there to compare program values to expected values. If all of the relevant variables have their expected values, then the bug is most likely after this point of the code, while if there is a disagreement, then the bug is somewhere ahead in the code. Repeat this bisection method until you are able to hone in on where the bug occurs. Once you find and fix the bug, if you are still having trouble there is probably another bug in your program, so start over again in the middle of the program. This is where persistence comes in, as you need to repeatedly stick to a plan to look through the program in a systematic way. ===================== Test Functions ===================== A useful method for working out bugs that is related to print statements, but is slightly more sophisticated, is the use of test functions. The idea is that if designed correctly, each separate task in a program should be written as a function. We can exploit this modularity and make our debugging and testing more robust by examining each function individually with a **test function**. A test function is a separate piece of code that calls a function multiple times with particular values (often chosen to test certain cases that may give a code trouble), comparing the results to the expected values and printing out the results. Here is an example test function for ``appendList`` above: :: def test_appendList(): 'tests appendList for several different cases, printing out results' list1_vals = [[], [1], [1, 2]] # list1 values list2_vals = [[1], [], [3, 4]] # list2 values combined_vals = [[1], [1], [1, 2, 3, 4]] # expected results # loop over test cases, printing out test results for list1, list2, combined_expected in zip(list1_vals, list2_vals, combined_vals): # copy lists before calling to check for modification list1_copy = list(list1) list2_copy = list(list2) combined_actual = appendList(list1, list2) if (combined_actual == combined_expected and list1 == list1_copy and list2 == list2_copy): result = 'PASSED' else: result = 'FAILED' print 'Test:' print 'list1: expected = ', list1_copy, ', actual = ', list1 print 'list2: expected = ', list2_copy, ', actual = ', list2 print 'combined: expected = ', combined_expected, ', actual = ', combined_actual print 'Result: ', result print '' This code runs three different test cases through the function ``appendList``, and prints out the results. I introduced a new piece of syntax in writing this, ``zip``, which is a convenient function for iterating over multiple lists at once. Basically, ``zip`` combines together a series of lists to let you use the convenient syntax of ``for in ``, but doing so over multiple lists at the same time without having to worry about indices. You can also do this using indices, but I find this approach to produce code that is easier to understand. There are a few things to notice about this test function: * First, note that I set up the problem using three lists and a for loop to iterate over the test cases and the expected results, rather than writing out the same code several times. I did this for two reasons: first, I want to make sure that each test is run in exactly the same manner, and I also want to maintain flexibility about the number of tests that can be run. To add an additional test, all I have to do is add another set of items to ``list1_vals``, ``list2_vals``, and ``combined_vals``, and the code automatically does the additional tests. If it is difficult to run additional tests, you are not likely to do them! * Second, note that I was careful about checking whether or not the original lists were modified by comparing the input lists to separate copies. As we saw above, it is possible to get the correct output, but incorrectly modify the input, and thus we should test the inputs as well as the output. If a function is not supposed to modify the input, then it is a good idea to build this into the test function. * Finally, this test function could have be written independently of a working version of ``appendList``. This means that I could have even written the test function *before* I wrote the actual function -- once I have defined what the function ``appendList`` does, I can figure out how to test it without even writing it. You don't actually need to do this, but it is a good idea to think about how to test a piece of code during the entire process. The sooner you start doing rigorous testing of a piece of code, the better. Writing test functions where you compare results with expected outputs for each piece of code is in general a good idea. Start small with the lowest level functions, and then work up to the highest level. This way, if you ever need to change a piece of the code, you can run all of the appropriate tests again to verify that you have not broken anything. In some of your homework questions, I will run a suite of test problems analogous to the test shown above to verify that your code does everything correctly, and doing so yourself is also a good idea. ================= Assertions ================= One final piece of advice for writing bug-free code in the first place is to use **assertions** in your code. Assertions are statements that cause your program to stop running if the conditions that they test are not met, and can be a good way to help catch bugs. Typically, when I write a function, the first few lines are assertions that check a few conditions that might trip up my code if the inputs are not correct. Therefore, I know right away that something is wrong, and *where* it went wrong, rather than having to deduce it from the final result and then work my way through the code systematically to find the problem. While this cannot help me debug ``appendList`` if it is not working correctly, if I was writing ``appendList`` as part of a larger project these could help me figure out if there is a bug in any code that calls ``appendList``. For instance, let's say that I am writing a piece of code where I need to combine two lists into a third list without modifying either list. This can be easily done with our ``appendList`` function. However, if one of the lists contains just a single number, and I somehow think that ``appendList`` works like the append method for a list, I might try something like ``appendList(someList, a)``, where ``a`` is a float. As it stands now, if I try to run the code, I get an error message regarding that ``'float' object is not iterable``. While I might be able to figure out that this is the problem from the error message, I can use assertions to test this and make the failure more explicit. Basically, before doing anything inside the function, I would like to be certain that both of my input arguments are lists. If they are not, then instead of trying to do a set of operations that I know will fail, I can raise a very specific error message that explains the problem in plain English. Here is the code with assertions: :: def appendList(list1, list2): 'appends list2 to list1, returning the combined list. does not modify either input list' assert type(list1) is list, "input argument list1 must be a list" assert type(list2) is list, "input argument list2 must be a list" outlist = list(list1) # outlist = list1[:] is also acceptable for item in list2: outlist.append(item) return outlist With the added assertions, now if I try ``appendList(someList, a)``, I will get an ``AssertionError``, along with the message ``input argument list2 must be a list`` (the optional string in an ``assert`` statement will be included in the error message). This is much more helpful in figuring out how to correctly call ``appendList`` than the previous error message. If I call the code correctly, both assert statements silently pass, and do not affect the execution of the code. To check the type of a variable, I used the protected keyword ``is`` along with the ``type`` function. ``type() is `` gives back a boolean value, and can be used to check the type of a variable. This is often something that you would want to check. If something is a counting index, assert that it is a nonnegative integer. If something needs to have a numeric value, then assert it is a float or integer. If two lists must have the same length, add an assertion to check that the lists have the same length. I try to check anything that may make my program go haywire or execute incorrectly with an assertion, as they typically give more useful information than other error messages. Assertions can also be used as a handy trick to crash a program intentionally. If I am trying to debug something, and in doing so change something so that I *know* that the remainder of the program will not run correctly (or take a long time to run), I often short-circuit the program by adding ``assert False``, which always stops the program dead in its tracks. Otherwise, to prevent the remainder of the code from running, I would have to comment it all out (which may take a long time to do and then undo when the problem is fixed). I can save myself time and typing in such a case by adding a simple assertion to the code. ==================== Practice ==================== Here are some problems to work on in lab today. Use print statements to help debug them as you work. You should use assertions where appropriate, and also write a small test function for each problem. * Write a function ``isPalindrome`` to test if a list is a palindrome (i.e. the list is the same backwards as it is forwards: ``[1, 2, 1]`` is a palindrome, ``[1, 2, 2, 1]`` is a palindrome, but ``[1, 2, 3, 1]`` is not). * One method for calculating the square root of a number :math:`{s}` is as follows: (1) take a guess :math:`{x_n}`, (2) check if :math:`{x^2_n=s}` within some allowed amount of tolerance (if so, we have a good enough answer and we can exit), (3) if the guess isn't good enough, make a new guess :math:`{x_{n+1}=(x_n+s/x_n)/2}` and return to step 1. Write a function ``calcsqrt`` to implement this method, taking :math:`{s}` as an input and returning the square root. * Add appropriate assertions and write a test function for some of the functions from Lab 5. ========================== Importing External Code ========================== By now, you have a basic understanding of how to program in Python (or any other language, for that matter -- if you can program in Python, it is easy to learn to program a different language). Many of the things I have highlighted are aimed and making better written, more usable computer code, in particular how to use functions to accomplish this. But what if you are writing some code, and would like to use a function that your wrote at some previous time? From what we have seen so far, you can only do this by copying and pasting the function into your Python script. If we only could use a single file for all of our coding, this would get out of hand very quickly, and if you needed to make a change to the function, you would need to find it in every file and change it. The solution to this problem is to use Python's ``import`` capabilities. Basically, the idea is that we can load functions and other code from a separate file using ``import``, which are loaded into what is known as a ``module``. Let's say we have a function ``add`` that we have defined in the file "addition.py." If we would like to use ``add`` in a separate piece of code, we can do the following: :: import addition print addition.add(1., 2.) This code will import all functions in the "addition.py" file, and allow you to use them with the name ``addition.``. The use of ``addition`` is what is known as a **namespace**, which is just a fancy way of saying that Python groups functions together based the place from which they were imported. Namespaces help you keep track of where different functions come from. You can choose the namespace where external functions are imported with the following syntax: :: import addition as add print add.add(1., 2.) If you were going to use the ``add`` function many times, choosing a shorter namespace can help cut down on typing. If you have many functions in ``addition``, but are only going to use ``add``, you can also selectively import functions from a module with the following syntax: :: from addition import add print add(1., 2.) This puts ``add`` in the current environment, so it if you want to minimize the amount of typing, this is the way to go. Finally, you can import all functions from a given module into the present environment using: :: from addition import * print add(1., 2.) This is okay if there are only a couple of functions in the ``addition`` module, but if there were many of them, it can be confusing to not group them all together in a separate namespace and this is not considered good programming practice. This is particularly true if you import more than one module using ``*``, and it is not made explicit which functions belong to which modules. Where does Python look for modules? First, it looks in the current working directory from which python was called. If it finds a module in the present directory, it uses that module. If it does not find one in the current directory, it searches all the paths specified on what is called the "Python path," and the last place it looks is the standard installation directory for python packages. Do not worry about the last two places for right now; these are only needed if you want to create packages that can be imported from a different directory than the one you are working in (it is fairly easy to modify the Python path to find packages in other directories; I can show you how to do this if you ask). In addition to your own modules, there are many externally available packages that are useful in many contexts. We will not cover these in this course, but I frequently use many of them, in particular ``numpy`` which defines numerical arrays that are basically a more powerful version of a list and useful for scientific computation, ``scipy`` which gives access to many standard methods for scientific computing, and ``matplotlib`` which is a powerful plotting library. These free and open source modules replace most of the MATLAB functionality using Python, and these are becoming common tools in science. You are of course welcome to experiment with them on your own to see if they will be useful in your research. Many of these tools are installed in the Python installation in the Mac Lab, but they are also easy to download and install on your own computer. ====================================== Reading From and Writing To Files ====================================== Over our previous three classes, we have seen how we can use Python to carry out complicated tasks on data that we enter ourselves. However, as discussed in the first class, we usually need to perform calculations on data that we obtain from *external* sources. This means we need to get the data into Python, and need a way to save results of calculations. Python has a number of capabilities for reading and writing data. We will not cover all of them, but for your homework and term project, you will need to use Python to perform calculations based on data input from a text file, so here I cover the basics of text file input and output. ---------------------- Reading From a File ---------------------- Python has built-in capabilities for dealing with text files through what is known as a **file object**. A file object is a fancy data type that represents a file on your system, and includes functions that let your read and write from the file. To open a file object, use the following syntax: :: f = open('myfile.txt', 'r') The basic command is the ``open`` function. ``open`` takes two arguments: the name of the file (``'myfile.txt'`` in this case), and the mode under which that file will be opened (``'r'`` in this case, which indicates that the file should be opened in read-only mode). You can specify either an absolute or a relative path for the name of the file, but you must give a string for the file name. You have several options for the mode: ``'r'`` indicates read-only mode, ``'w'`` indicates write-only mode, and ``'rw'`` indicates read-write mode. You do not need to provide the second argument as ``'r'`` is the default, though I try to always put this into my codes as I want to be explicit about how a file will be opened so that I do not accidentally open a file that I do not want to modify in write mode. Note that if you want to read from a file, the file must exist before you try to open it. If it does not exist, you will get an error message. You also need to be careful when opening a file in write mode, as it will overwrite a file if it already exists. Once you are finished with the file, you should close it using the ``close`` function of the file: :: f.close() This will close the file. The file will automatically close when the program finishes if you did not explicitly close it, but it is a good idea to close files when you are done with them because they can slow down your program if too many are open. To read from a file that has been opened in read mode, use the ``read`` function of that particular file: :: f = open('myfile.txt', 'r') a = f.read() print a f.close() The entire contents of the text file will then be stored in the variable ``a`` as a string, and will be printed to the screen. Note that everything read from a file is a string, so if you need to get numbers out of a file you will need to convert to the appropriate type. Reading the entire contents of a file into a single string is fine for short files, but if you have a large data file it might be a bit tedious to handle the data in this way. More likely, you will have a file where a certain number of items are written on each line. To read in a single line from a file, use the ``readline`` function. So if you had a file with three lines, you might do the following: :: f = open('myfile.txt', 'r') for i in range(3): a = f.readline() print a f.close() As before, this will print the file to the screen, but each line will be separated from the next one. However, what if we don't know how many lines our file has? Fortunately, Python makes it easy by letting us loop over the file itself to implicitly read it one line at a time: :: f = open('myfile.txt', 'r') for line in f: print line f.close() Note that you can implicitly perform the ``readline`` function each time through the loop in this way. This is the way I recommend handling reading text from a multi-line file. How do you deal with text once you read it in? A frequently used function for handling strings read from a file is ``split``, which takes a string and divides it into words, giving you a list of strings. For example, if you have a text file with three numbers on each line separated by a space, you get those numbers as follows: :: f = open('myfile.txt', 'r') for line in f: stringlist = line.split() # stringlist is a list of three strings numberlist = [] for string in stringlist: numberlist.append(float(string)) # numberlist is now a list of numbers # peform some analysis, etc. f.close() ----------------------------- Writing To a File ----------------------------- Writing to a file is similar to reading from a file, you just need to open the file in write mode. Once you have the file open in write mode, you can write data to the file using the ``write`` command. As an example, imagine you have carried out a calculation on 100 different numbers, getting two different results. You have the results stored in a list ``results``, with each list entry being a list of length two holding the two results. To write this to a text file, you might do the following: :: f = open('calculation_results.txt', 'w') for result in results: f.write(str(result[0])+' '+str(result[1])+'\n') f.close() Note that you can only write strings to a text file and that I had to explicitly convert them here before I could write to file. You might also notice that I had to put in my own dividing characters between the entries: there is a space between the two numbers, and a newline (i.e. a carriage return) character to separate each line. You can also separate the entries with a tab character using ``'\t'``. You can do more sophisticated things with python, including writing binary data and writing things in a Python-specific format that lets you save and reload complex objects. We will not worry about these things here, as for this class reading and writing simple text files is all that you will need to worry about. However, if you use Python in your research, you will no doubt want to take advantage of these capabilities.