Exploring Bioinformatics with Python
Basic:
[Skip To Body]
[Front Door]
|
[Reference]
[Labs]
[Projects]
Courses:
[BIO/CSC295.01 2009F]
[BIO/CSC295.01 2011F]
Python:
[python.org]
[biopython.org]
Misc:
[Exploring Bioinformatics site]
Summary: We explore the basics of the Python programming language, using the Wing 101 interactive development environment (IDE).
Python is a popular programming language used for a wide variety of purposes, from Web services to Bioinformatics. A few important characteristics of Python led us to choose it: (1) Python is used increasingly as the language for bioinformatics; (2) Python has a relatively simple syntax, one that evidence suggests that novices pick up quickly; and (3) Python is often used in an interactive environment, one in which you can immediately see the results of your computations.
Python is primarily an imperative programming language. That is, the model is that you give Python a sequence of instructions and it executes the instructions one by one. Additional control operations let you choose between and repeat operations. Python also provides a rich object-oriented toolkit. And Python even includes some functional components. We'll look at each characteristic throughout the semester.
We recommend that you use the Wing 101 IDE for Python, although you can also do these labs using a generic Python interpreter and a text editor of your choice.
Traditionally, when you work with Python, you have one shell
window
open and zero or more program windows. You work interactively in the shell
window, and you develop more complex code in the program windows. (Python
provides a relatively easy way to load the code from the program windows
into the shell window.)
When Wing 101 begins, it opens a shell window in the lower-right-hand corner of the screen, a larger program window at the top, and a search window at the lower-left corner. We would recommend that you enlarge the shell window by clicking at its left border and dragging to the left.
We'll start with the shell window. In many ways, the Python shell window acts like a calculator. You can type expresions and it will return results.
Type each of the following expressions in the shell wndow and see what Python returns.
In programming, it is often useful to name values. We call such
named values variables
. You give a value to a name by using the
assignment operator, variable = expression.
>>> x = 3
>>> x * 4
12
Name pi as 3.14159 and then compute the area and
circumference of circles of radius 3, 21, and 82.
If you try to do more complex mathematical operations, such as computing a
square root or a cosine, you may be puzzled that Python does not seem to
immediately provide such functions. In fact, Python puts a huge number
of procedures in separate libraries. The use of libraries helps
keep Python smaller and faster. It also helps avoid name collisions. (E.g.,
should the function square square a number of draw a square?)
You can load Python's primary mathematics library with
>>> import math
Once you have imported the library, you have access to a variety of procedures,
such as math.sqrt (compute a square root) or
math.sinh (compute hyperbolic sine).
Import that math library and then have Python compute the square root of 4, 2, 21, -2, and a number of your choice.
But how do you know what procedures the math library provides?
Python sets itself from many other languages by integrating a help system
within the language shell. You can get help on most procedures (and most
libraries, and even most values) using the help function.
Type each of the following expressions and observe what the shell does.
help(math.sqrt)
help(math)
So far, we've been exploring how Python serves as a calculator. While such calculations will be useful in bioinformatics, more frequently we'll want to work with some representation of sequences (e.g., DNA as a sequence of nucleotides, proteins as a sequence of amino acides).
In Python, you will find it easiest to work with sequences using the list data type. In Python, lists are written with square brackets, letters in quotations, and commas between elements. For example,
>>> seq = ['A', 'A', 'C', 'G', 'T', 'A', 'C', 'C']
You can access a single value in a sequence by following the name of the sequence with a left square bracket, then an index, then a right square bracket. (Indices start at 0, rather than 1.) For example,
>>> seq[2] 'C' >>> seq[3] 'G'
You can change a sequence by using an assignment statement (that thing with an equals sign), putting the bracketed part on the left.
>>> seq ['A', 'A', 'C', 'G', 'T', 'A', 'C', 'C'] >>> seq[0] = 'G' >>> seq ['G', 'A', 'C', 'G', 'T', 'A', 'C', 'C'] >>> seq[4] = 'C' >>> seq ['G', 'A', 'C', 'G', 'C', 'A', 'C', 'C']
There are many other things you can do with sequences. We'll explore those throughout the course.
You will note that it is a bit of a pain to create a longer sequence by
listing each element and separating with commas. Fortunately, Python
provides a variety of other techniques. One that we like to use is
to build a sequence with list('characters').
>>> seq = list('AACGTACAGAATAATTTA')
>>> seq
>>> seq
['A', 'A', 'C', 'G', 'T', 'A', 'C', 'A', 'G', 'A', 'A', 'T', 'A', 'A', 'T', 'T', 'T', 'A']
You can convert back to the short form with the string.join
procedure (from the string library, naturally). In particular, we might
write something like
>>> import string >>> short = string.join(seq, '')
Write a sequence of instructions that
Using the help feature, find out some of the things that you can do with a list. (You may want to ignore the procedures whose name begins with two underscores.)
Python, like most programming languages, also lets you write your own procedures. The standard form of a procedure is
def procedure(parameters):
"""Description of procedure."""
instructions
return value
For example, here is a procedure that computes the area of a circle
def circle_area(radius):
"""Compute the area of a circle using the standard formula."""
return math.pi * radius * radius
Similarly, here is a procedure that builds a new sequence by inserting AAA at the beginning of a sequence.
def prefix_aaa(sequence):
"""Create a new sequence by adding three A's to the beginning
of sequence."""
return ['A', 'A', 'A'] + sequence
There are two ways that you can add your procedures to Python. You can type them in in the shell or you can edit them in the program window and then click the green arrow to run them. Typing procedures in the shell window is error prone. We recommend that you generally save procedures in files and then load them, but we'll give you an opportunity to try both approaches.
Do the following.
circle_area from above in the
Python shell window.
help(circle_area) to determine what Python knows
about your procedure.
prefix_aaa.
prefix_aaa.py.
help(prefix_aaa) to determine what Python knows about
this procedure.
This page was generated by
Siteweaver on Wed Aug 31 10:49:16 2011.
The source to the page was last modified on Thu Aug 25 11:22:06 2011.
This page may be found at http://www.cs.grinnell.edu/~rebelsky/ExBioPy/lab-getting-started.html.
You may wish to validate this page's HTML
Samuel A. Rebelsky