.. _python1: ********************************* Basic Programming with Python 1 ********************************* Now that we have a sense of how to use the Unix shell, we will examine some programming basics. Today we will be mostly getting acquainted with Python, which is a common programming language used in many scientific fields. Python has several nice features that make it useful for beginning programmers, and the standard implementation of Python is open source and free. You are of course free to install Python on your own computer if you want to practice programming away from the computer lab. Python is my default programming language for general scientific computing. I use it almost as often as a Unix shell, as there are extensive tools for scientific number crunching, analysis, and plotting. As we will see, MATLAB is another option for many of these types of tasks, but I find the Python programming language to be more powerful and flexible and thus I have found that for me, Python has almost completely replaced MATLAB. I do more serious computing in a compiled language like C or Fortran, but if the task is simple and does not need to be optimized for performance, Python is typically my choice. This lab session will be interactive, as I will demonstrate everything myself on the computer. You should type along with me when I am demonstrating something, and then I will give you a chance to practice on some exercises. ========== Python ========== Python is a high level programming language. "High Level" refers to the fact that the language comes with many built-in features that might not be present in other languages (and you would therefore need to write these yourself). The higher the level of a language, the shorter programs tend to be. This is generally a good thing -- it is better to have a group of coordinated experts implement a feature to get it working correctly than to have every single user write their own version. It also saves you time, in that programs are faster to write and debug. One reason Python is often used for introductory programming is that Python programs tend to be rather literal. They often read fairly clearly in English, making them easier to understand. This is not always the case, as we will see as the semester progresses (just wait until you are trying to wade through an incomprehensible GMT shell script, and you will appreciate many aspects of Python). A second feature of Python is that it uses indentation to denote syntax. This means that Python takes the layout and spacing of a program into consideration when interpreting its meaning. The alternative is to use specific characters to denote the meaning. While one method is not better than any other, it is generally agreed that the type of indentation required by Python is good programming practice, as it makes programs more readable and easier for humans to interpret. For this reason, I consider the requirements imposed by Python a good thing for beginners, as it forces them to develop good programming habits. I will demonstrate how this works as we go today. We will be using Python 2 in this class. It is important to note that code written for Python 2 and Python 3 are not interchangeable -- there are some important issues with the language that have been cleaned up in Python 3. If you want to install Python on your own computer and use examples from this class, make sure you install Python 2 (or understand how you need to change your code for Python 3). ======================================= Interpreted vs. Compiled Languages ======================================= One issue with programming is the fact that the processor on your computer cannot directly interpret a program. Computer processors can only do a limited number of operations (think add, subtract, multiply, divide and logical operations), so every operation must be translated into a series of instructions of this type. Fortunately, we have computer programs that are able to do this for us. There are two types: interpreters and compilers. * **Interpreters** turn a program into instructions one line at a time. The interpreter reads a line of code, checks if it is valid code, creates a set of instructions, and then executes the instructions. It then proceeds to the next line until the end of the code to be executed is reached. * **Compilers** read the entire program and turn the whole thing into executable instructions, saving the instructions as an executable file. The instructions are not executed until the file containing the instructions is run. These two models have pros and cons. Interpreters are easier to use in general, because the interpreter has direct access to the part of the code is currently being executed. If it encounters a problem, it can tell you exactly where it ran into that problem in your code. However, because it is constantly creating instructions and executing them on the fly, interpreters cannot perform optimizations as readily as compilers, so code performance can suffer as a result. Compilers tend to create code that runs faster, but if there is an error in your code, it can be much more difficult to track down since it is not known which part of your code produced the instructions that could not be executed. All of the languages that we will use in this class are interpreted. This makes them easier to use, but not ideal if performance is necessary. However, computers these days tend to be fast enough that you are unlikely to need a compiled language unless you are doing some serious number crunching. ===================================== Interactive Sessions and Scripts ===================================== To start up the Python interpreter, open a Unix shell and type ``python`` and a carriage return. You will see some information about the version of Python and then a prompt ``>>>``. This is an interactive session with the Python interpreter. You can enter one set of instructions at a time, and the interpreter will execute them as they are typed, printing out the results to the shell. This is a nice way to use Python for simple work, as you can check your work as you go along by directly seeing the output results. To exit, type ``exit()`` at the prompt. For more complicated work, it is convenient to save all of the code to be executed into a single file, say ``myscript.py`` and then run the entire file through the Python interpreter. This is called running a script, in that the file is a set of commands to be executed (much like a script is a set of lines to be read in a performance), and can be done by typing ``python myscript.py`` followed by a carriage return into the Unix shell. This will run the entire script, with any output written to the shell. Initially, you will probably do your programming interactively in Python until you get the hang of it. After that, as your programs get more complex, you will want to write scripts using a text editor (hopefully you will use this as a chance to practice with a command line editor) and then run them directly from the shell. ========== Basics ========== Start up the Python interpreter. I will provide some demonstrations of the following on the screen. You are encouraged to type along with me -- playing around with an interpreter is usually the best way to learn how to do things. I will show you the following: * **Basic data types:** integers, strings, and floating point numbers make up the most elementary objects in almost any programming language, including Python. **Integers** are signified by simply entering the integer into the interpreter: ``5``, ``-1`` and ``10000`` are valid integers in Python. **Strings** are any set of characters enclosed by either single ``'`` or double quotes (``"``). If you want to include quotes in the string, then either use the opposite type of quotation marks to enclose the string, or put a backslash (``\``) before the quote inside of the string. Finally, you can denote a **floating point number** (i.e. a number with a decimal part) by typing a number with a trailing decimal point, such as ``100.`` or ``0.001``, or writing the number in scientific notation like ``1.e-3``. You can change between these data types in Python. To change to an integer, use ``int( )``; to change to a float, use ``float( )``; and to change to a string, use ``str( )``. Some transformations are not possible: for instance, if you have a string that contains non-numeric characters, you will not be able to change it into a float or an integer. If you change a float into an integer, Python will drop the decimal part of the number. If you are intending to write a program to work with a certain data type, it is a good idea to convert any program input into the correct datatype to prevent errors further down the line. * **Basic operations:** Python can do basic operations on these primitive data items. For instance, to add two integers, you can enter ``2 + 3`` and the interpreter gives you back the result. Built-in operations on integers include addition (``+``), subtraction (``-``), multiplication (``*``), division (``/``), exponentiation (``**``), and the modulus operation (the remainder after dividing the first number by the second, denoted by ``%``). Note that when dividing two integers, you get an integer back! Operations follow the standard order of operations. To alter the order in which operations are carried out, use parentheses to group operations: ``2 * 3 + 4`` is not the same as ``2 * (3 + 4)``. These same operations, except for modulo, can be applied to floating point numbers. The only difference is that you always get a floating point number back, rather than an integer. If you combine integers and floats, Python will automatically change the integer into a float and perform the floating point version of the operation. For strings, operations include addition (which concatenates two strings), and multiplication (which repeats the string a given number of times). Note that if you try to do something nonsensical like add a string and an integer, Python gives you an error message. * **Print Statements:** To print something out, use the ``print`` statement. Often, beginning programming books and tutorials start with a program known as "Hello world!" In Python, this program is very succinct, taking up only a single line: ``print 'Hello world!'``. You can also print the results of more complicated operations. Print statements are useful for debugging, as they let you see intermediate results on the screen, allowing you to check if they are correct. * **Assigning variables:** While you can do basic calculations with Python, a programmable computer isn't particularly useful unless you can store intermediate results to be re-used later. Otherwise, you would have to repeatedly type in complex expressions, where you are likely to make a mistake, which defeats the purpose of having a computer do the calculations. Assignment is done through the assignment operator ``=``, which binds a name on the left hand side to a value on the right hand side. Variable names can be anything you like, within a set of rules: (1) variable names must start with a non-numeric character (numbers are allowed, but not in the first position), (2) they cannot contain any special protected characters (like ``=``, ``+``, ``.``, etc.) that have other meanings in Python, (3) they cannot be certain protected keywords (we have not encountered any yet, but we will see examples today such as ``if`` and ``for``). Assignment is very handy, as it lets you (1) re-use previous results without repeating the calculation (or having to type it in again!), (2) break up complex statements that are difficult to type into multiple pieces, and (3) choose informative variable names can help make your code easier to read by a human (such as me when I am grading your homework, or most importantly yourself in 2 weeks when you are re-visiting a piece of code that you completely forgot about in the meantime). * **Input:** To enter data from the keyboard, we can use ``raw_input( )``. If you provide a string to ``raw_input``, that string will be displayed on the screen when asking for input. So for example, I can use the following code to enter a string into a Python program: :: a = raw_input('Enter a string: ') The string entered will then be stored in variable ``a``. Raw input is always a string; if you need to enter a numeric value, use ``int(a)`` or ``float(a)`` to convert the string to the appropriate type. * **Comments:** All programming languages have characters that tell the program to ignore whatever is written afterwards. This lets you add documentation that explains why you are doing what you are doing (again, something that is very handy to you in 2 months when you re-visit some code you wrote and forgot about), explain details about how something works, and things like that. Documenting code is a *very* important habit to develop. I will mark you down on homework assignments if you turn in code that isn't documented sufficiently. Comments can be either full line comments, or occur inline after other Python code. Comments in Python are designated using the pound character (``#``). In general, it is a good idea to document *how* and *why* you are doing what you are doing, rather than *what* you are doing. Take the following comment examples to see which is more useful: :: # npoints is one less than a npoints = a - 1 This is not particularly helpful, as it is obvious from the code itself what you are doing in this case. A more useful thing to comment is explain *why* you are doing something: :: # decrease a by one to get npoints, as first is double counted npoints = a - 1 This is much more helpful -- not only do we know that ``npoints`` isn't the same as ``a``, but *why* they are not the same. If we need to modify the code, comments explaining why something was done would be more informative for us. All of the above are basic features of any programming language, so these are not particular to Python (and we will see them again several times during the semester). =============================== Conditional Statements (if) =============================== Above, we saw a bit about the fundamental building blocks of any programming language. However, these pieces alone aren't enough to make computers particularly useful. This is because the Python interpreter executes every line we enter, one at a time. But what if some future calculation depends on the result of a previous one? We would need to do the intermediate calculation, then decide for ourselves what action to take, and only then could we enter the remaining set of instructions. We can avoid this by using an ``if`` statement. The basic structure of an if statement is as follows: :: if condition1: statement1 elif condition2: statement2 ... else: statement Depending on the various conditions, this program will execute different statements. If boolean ``condition1`` is true, then the program executes ``statement1``. If ``condition1`` is false, then the program checks if ``statement2`` is true; if so it executes ``statement2``. You can add as many ``elif`` blocks to the program as you like, and the interpreter will continue checking them one at a time until it encounters one that is true. If none of the preceding statements are true, then when the ``else`` part is reached, ``statement`` will be executed. In an if statement, the ``elif`` and ``else`` blocks are optional. You do not need an ``else`` block at the end, nor do you need any ``elif`` blocks in the middle. You can therefore tailor the if statement to do whatever specific steps you need for any type of condition. The ``condition`` parts are what are known as **boolean** datatypes: they can only have two values, either ``True`` or ``False``. This is another type of basic item that is found in all programming languages, and you can usually also use ``1`` for true and ``0`` for false. One way to get a boolean is to make a comparison. For instance, if you have an integer variable ``a``, you could see if that variable is equal to 1 with the statement ``a == 1`` (note that comparisons in Python use the double equals sign, while assignment uses a single equals sign; this is true for most programming languages). Other comparisons include: ``>`` (greater than), ``<`` (less than), ``>=`` (greater than or equal to), ``>=`` (less than or equal to), and ``!=`` or ``<>`` (not equal to, both are equivalent). Booleans can be combined to form booleans using the operations ``and`` and ``or``. ``a == b and c == d`` is true only if a and b are equal *and* c and d are equal. If either of the two comparisons are false, then the whole thing is false. ``or`` only requires that one of the conditions be true. Multiple booleans can be combined with parenthesis (if needed to specify precedence). If you want the opposite of a given comparison operation, use ``not`` before the comparison. So for instance, ``not 1 == 2`` is ``True``, while ``not 2 > 1`` is ``False``. One thing you should have noticed above is that all of my statements were indented from the left by the same amount. In general, it is a good idea to indent blocks of code that belong together like this -- it makes it clear to the reader what code is executed when each of the various conditions are true. You should always indent your code consistently in this class when turning in assignments, and if I find it hard to read, I will take off points. However, most programming languages do not *require* you to indent your code this way. Some other characters indicate which statements belong to the various code blocks. Python, on the other hand, *requires* that you indent your code consistently and actually uses the spacing to interpret the meaning of your program. If you try to run a Python program that is not consistently indented, you will either (a) get an error message back, or worse (b) your program will run but not give you the right answer. Some people find this aspect of Python to be annoying, but to me this is a very positive feature about Python (particularly for beginners as it forces you to develop good habits). Spend some time experimenting with if statements. Try some with just an ``if``, others with ``if`` and ``else``, and others yet with ``elif`` blocks. Print statements are good things to include in the various blocks, as they let you know visually that your code ran as you intended. ============================== Loop Statements (while, for) ============================== While conditional statements help us to automatically treat different cases in different ways within a program, the computer is not a powerful tool for automation unless we are able to do things repeatedly. Repetition requires looping capabilities. Without a loop, the only way to write a computer program that does more computations is to make the program longer, which requires more typing by you (and thus, more opportunities for you to make a mistake). Loops, on the other hand, allow you to execute something many times. This is what makes a computer a powerful tool for analyzing data: we can do the same thing over and over again on different data without doing extra work. Programming languages have two basic types of loops (though some have additional variations on them): ``while`` loops, and ``for`` loops. Let's examine each one separately: **While Loops:** These loops allow you to run a block of code repeatedly as long as a certain condition is true. This is best illustrated with an example: :: a = 10 while a > 0: print a a = a - 1 When executed, this code will print out the integers from 10 to 1, counting down one at a time. (Try this for yourself.) How does this work? First, we initialize the variable ``a`` to be 10. Then we enter the loop, which executes as long as ``a`` is greater than zero. At the start of the first loop, the condition is true, so the loop executes, printing 10 to the screen, and then setting the value of ``a`` to be ``a - 1``. Note that this equation does not really make sense mathematically -- how can ``a = a - 1``? However, recall that ``=`` is not an equality in Python; it is an assignment operator. Here, we first calculate the value ``a - 1`` and assign it to the variable ``a`` afterwards, overwriting the old value of ``a`` in the process. From there, we have completed the loop, so the cycle starts over again. Python checks if ``a > 0``, and since it is, it executes the loop again. This continues until we reach the case where ``a`` is zero. In that case, the condition is no longer true, so Python does not execute the loop, and we reach the end of our mini program. There are a few things to be aware of here. First, note that like with the ``if`` statement, I have indented all of the code in the loop. This is something you should do in any program you write, even though it may not be required, as it gives a visual cue as to what code is executed each time through the loop. In Python such indentation is necessary. Try running the program again, but change the indentation: you will either get an error message, or the program may run forever, because there isn't appropriate spacing in the loop body. Second, and most importantly, is that I made sure to write this loop so that ``a`` decreased every time through the loop. Otherwise, we could not guarantee that the loop would ever finish (something known as an infinite loop). Try running your program without the ``a = a - 1`` line: you will see ``10`` printed out to the screen in perpetuity. This is an important thing to keep in mind when writing a program, as if you have an infinite loop in your code, it will never run to completion. One thing to note about ``while`` loops is that we may not know easily in advance how many times we need to go through the loop. In this case, it was fairly obvious how many times it would run, but in other cases, ``a`` might decrease by a different amount each time through the loop, or it may not decrease every time. Because of this, most programming languages have another kind of loop that executes a fixed number of times, reserving while loops for the case where loops are executed an unknown number of times. **For Loops:** To execute a loop a fixed number of times, use a ``for`` loop. Here is an equivalent program to the one above, using ``for`` instead of ``while``: :: for i in range(10): print 10 - i To use a ``for`` loop in Python, you need to tell it what specific values to iterate over. One simple way to do this is to use ``range``. By default, if you type ``range(10)``, then it will repeat your loop for values ranging from 0 to 9, with increments of 1. If you want to start at a different value other than 0, then ``range(1,11)`` will start at 1 and end with 10 (note the last value is one less than the final number). To get regularly spaced numbers, use ``range(, , )``. So to get all even numbers from 0 to 20 (inclusive), you could use ``range(0, 21, 2)``. Within the loop, ``i`` then has whatever value is appropriate for that iteration through the loop. You might notice that an explicit ``for`` loop is not really necessary for doing simple loops like this, as you can just as easily write this as a ``while`` loop, and ``for`` loops are really only a convenience. This is mostly true, though in Python in particular there are more complex data types where ``for`` is much more sensible for iterating through a bunch of values, as opposed to ``while``. In general, I find that I use ``for`` loops much more frequently than ``while`` loops, as most of the cases where I need a loop involves a specified number of repetitions. Sometimes, you may want to stop running a loop in the middle of a series of iterations. ``while`` loops provide one way to do this, but you can also change the behavior of either ``for`` or ``while`` loops with the ``break`` and ``continue`` statements. A ``break`` statement immediately terminates execution of a loop, while a ``continue`` statement proceeds immediately to the next loop iteration, skipping over any remaining statements. For instance, a version of our original while loop could alternatively be written as: :: a = 10 while True: if a == 0: break print a a = a - 1 This would produce the same output as the original program (try it!). If we wanted to run the same skip over 5 for some reason, we could use a conditional and a ``continue`` statement as follows: :: a = 11 while a > 1: a = a - 1 if a == 5: continue print a Looping constructs are central to efficient programming, as they are what make using a computer to repeat tasks so efficient. Any time you are doing something over and over, you should get into the habit of figuring out how to turn it into a loop -- as you will see, pretty much every type of software we deal with in this class has a way to loop over items (be it individual numbers, individual lines, individual files, etc.). ============ Practice ============ Here are some problems to practice programming in Python. Work on writing code either directly in the interpreter, or as an external script. * Write a program that calculates the average of three floating point numbers, printing the result to the screen. * Write a program that prints out the square of each integer in a sequence of your choosing (use ``range``). * Modify your previous program to exclude any numbers whose square is between 10 and 50. * Write a program to print out all integers that are perfect squares less than or equal to 100. * Write a program to print all positive integers less than 300 that are divisible by 3 but not divisible by 2. * Write a program to test if a positive integer is prime. (*Reminder:* An integer is prime if it cannot be divided evenly by all smaller positive integers beginning with 2.)