Exploring Bioinformatics with Python

Getting Started with Python

Summary: We explore the basics of the Python programming language, using the Wing 101 interactive development environment (IDE).

About Python

Python is a popular programming language used for a wide variety of purposes, from Web services to Bioinformatics. A few important characteristics of Python led us to choose it: (1) Python is used increasingly as the language for bioinformatics; (2) Python has a relatively simple syntax, one that evidence suggests that novices pick up quickly; and (3) Python is often used in an interactive environment, one in which you can immediately see the results of your computations.

Python is primarily an imperative programming language. That is, the model is that you give Python a sequence of instructions and it executes the instructions one by one. Additional control operations let you choose between and repeat operations. Python also provides a rich object-oriented toolkit. And Python even includes some functional components. We'll look at each characteristic throughout the semester.

Beginning Python

We recommend that you use the Wing 101 IDE for Python, although you can also do these labs using a generic Python interpreter and a text editor of your choice.

Traditionally, when you work with Python, you have one shell window open and zero or more program windows. You work interactively in the shell window, and you develop more complex code in the program windows. (Python provides a relatively easy way to load the code from the program windows into the shell window.)

When Wing 101 begins, it opens a shell window in the lower-right-hand corner of the screen, a larger program window at the top, and a search window at the lower-left corner. We would recommend that you enlarge the shell window by clicking at its left border and dragging to the left.

We'll start with the shell window. In many ways, the Python shell window acts like a calculator. You can type expresions and it will return results.

Type each of the following expressions in the shell wndow and see what Python returns.

Using Variables

In programming, it is often useful to name values. We call such named values variables. You give a value to a name by using the assignment operator, variable = expression.

>>> x = 3
>>> x * 4
12

Name pi as 3.14159 and then compute the area and circumference of circles of radius 3, 21, and 82.

Libraries and Procedures

If you try to do more complex mathematical operations, such as computing a square root or a cosine, you may be puzzled that Python does not seem to immediately provide such functions. In fact, Python puts a huge number of procedures in separate libraries. The use of libraries helps keep Python smaller and faster. It also helps avoid name collisions. (E.g., should the function square square a number of draw a square?)

You can load Python's primary mathematics library with

>>> import math

Once you have imported the library, you have access to a variety of procedures, such as math.sqrt (compute a square root) or math.sinh (compute hyperbolic sine).

Import that math library and then have Python compute the square root of 4, 2, 21, -2, and a number of your choice.

Getting Help

But how do you know what procedures the math library provides? Python sets itself from many other languages by integrating a help system within the language shell. You can get help on most procedures (and most libraries, and even most values) using the help function.

Type each of the following expressions and observe what the shell does.

Lists

So far, we've been exploring how Python serves as a calculator. While such calculations will be useful in bioinformatics, more frequently we'll want to work with some representation of sequences (e.g., DNA as a sequence of nucleotides, proteins as a sequence of amino acides).

In Python, you will find it easiest to work with sequences using the list data type. In Python, lists are written with square brackets, letters in quotations, and commas between elements. For example,

>>> seq = ['A', 'A', 'C', 'G', 'T', 'A', 'C', 'C']

You can access a single value in a sequence by following the name of the sequence with a left square bracket, then an index, then a right square bracket. (Indices start at 0, rather than 1.) For example,

>>> seq[2]
'C'
>>> seq[3]
'G'

You can change a sequence by using an assignment statement (that thing with an equals sign), putting the bracketed part on the left.

>>> seq
['A', 'A', 'C', 'G', 'T', 'A', 'C', 'C']
>>> seq[0] = 'G'
>>> seq
['G', 'A', 'C', 'G', 'T', 'A', 'C', 'C']
>>> seq[4] = 'C'
>>> seq
['G', 'A', 'C', 'G', 'C', 'A', 'C', 'C']

There are many other things you can do with sequences. We'll explore those throughout the course.

You will note that it is a bit of a pain to create a longer sequence by listing each element and separating with commas. Fortunately, Python provides a variety of other techniques. One that we like to use is to build a sequence with list('characters').

>>> seq = list('AACGTACAGAATAATTTA')
>>> seq
>>> seq
['A', 'A', 'C', 'G', 'T', 'A', 'C', 'A', 'G', 'A', 'A', 'T', 'A', 'A', 'T', 'T', 'T', 'A']

You can convert back to the short form with the string.join procedure (from the string library, naturally). In particular, we might write something like

>>> import string
>>> short = string.join(seq, '')

Write a sequence of instructions that

Using the help feature, find out some of the things that you can do with a list. (You may want to ignore the procedures whose name begins with two underscores.)

Writing Your Own Procedures

Python, like most programming languages, also lets you write your own procedures. The standard form of a procedure is

def procedure(parameters):
    """Description of procedure."""
    instructions
    return value

For example, here is a procedure that computes the area of a circle

def circle_area(radius):
    """Compute the area of a circle using the standard formula."""
    return math.pi * radius * radius

Similarly, here is a procedure that builds a new sequence by inserting AAA at the beginning of a sequence.

def prefix_aaa(sequence):
    """Create a new sequence by adding three A's to the beginning 
       of sequence."""
    return ['A', 'A', 'A'] + sequence

There are two ways that you can add your procedures to Python. You can type them in in the shell or you can edit them in the program window and then click the green arrow to run them. Typing procedures in the shell window is error prone. We recommend that you generally save procedures in files and then load them, but we'll give you an opportunity to try both approaches.

Do the following.


This page was generated by Siteweaver on Wed Aug 31 10:49:16 2011.
The source to the page was last modified on Thu Aug 25 11:22:06 2011.
This page may be found at http://www.cs.grinnell.edu/~rebelsky/ExBioPy/lab-getting-started.html.

You may wish to validate this page's HTML

Samuel A. Rebelsky
rebelsky@grinnell.edu