Exploring Bioinformatics with Python
Basic:
[Skip To Body]
[Front Door]
|
[Reference]
[Labs]
[Projects]
Courses:
[BIO/CSC295.01 2009F]
[BIO/CSC295.01 2011F]
Python:
[python.org]
[biopython.org]
Misc:
[Exploring Bioinformatics site]
Start the Wing 101 IDE.
As you may recall from the Python version of the base-pair algorithm, we start by working with individual nucleotides. Here is the function we developed for this conversion.
def base_pair(nucleotide):
"""Convert a nucleotide (represented as a one-character string) to
the paired nucleotide."""
if nucleotide == 'A':
return 'T'
elif nucleotide == 'C':
return 'G'
elif nucleotide == 'G':
return 'C'
elif nucleotide == 'T':
return 'A'
else:
return '*'
a. Create a new file in Wing 101 and copy this code to the file.
b. Save the file.
c. Click the run/play button (the green triangle) to load the file.
d. Verify that the procedure works as advertised by translating a variety of nucleotides. For example,
>>> base_pair('A')
'T'
e. Make notes of any flaws you observe in base_pair.
Even the best of programmers make mistakes when designing algorithms and when translating those algorithms to code. Let's explore what Python does if we make a few common mistakes in entering code.
a. The line that defines a function is supposed to end with a colon.
Determine what happens if you remove that colon. That is, remove the
colon at the end of base_pair(nucleotide):, save the modified
file, and then click the run button again. After
determining what Python does, repair the error.
b. The double-equals sign, == is used for comparison.
The single equals sign =, is used for assignment. Try using
the single equals sign in the first test. That is, replace
if nucleotide == 'A':
with
if nucleotide = 'A':
Once again, save the modified file and reload it. Next, see if you get an error during loading and, if not, see what the modified code does on a variety of nucleotides.
Once testing the modification, restore the correct comparison operation.
c. As you may have noticed, Python ends every control structure with a colon. See what happens if you remove the colon from
if nucleotide == 'A':
d. As you may recall from the corresponding discussion, we include
the last two lines of base_pair for the case that someone
fails to include a valid nucleotide.
else:
return '*'
Determine by experimentation what happens if you remove those two lines.
e. By custom, programmers use indentation to show the structure of their programs. Python differs from many languages by using the indentation of code to understand the programmer's intended meaning.
Determine what happens if you remove the extra indentation before the return statement in
return 'T'
f. The base_pair function is fairly straightforward in its
use of control structures. Consider the following two procedures:
def bar(x):
"""A sample procedure."""
if (x > 10):
x = x - 10
x = x / 2
return x
def baz(x):
"""A sample procedure."""
if (x > 10):
x = x - 10
x = x / 2
return x
Note that the only differences between these two procedures are their names and their indentation. Identify some values for which the procedures behave differently.
The base_pair procedure converts nucleotides to the
paired nucleotide. Of course, our goal is to convert whole strands
of DNA, rather than individual nucleotides. Hence, we build upon
base_pair to create opposite_strand.
def opposite_strand(strand):
"""Given a single DNA strand, reprented as a sequence, compute the
opposite strand."""
result = []
for nucleotide in strand:
result.append(base_pair(nucleotide))
return result
a. Copy this code to your file.
b. Run the updated file.
c. Verify that the procedure works as advertised by translating a variety of nucleotides. For example,
>>> opposite_strand(['A', 'C', 'T', 'T', 'U', 'G']) ['T', 'G', 'A', 'A', '*', 'C']
d. Observe any potential problems you observe in
opposite_strand.
As we noted in our initial explorations with sequences and strings, it is
often much more convenient to represent a sequence as a string than as a
list. It is, however, impossible to modify a string and it is time-consuming
to extend a string. Because of these difficulties, we wrote
opposite_strand to use lists, rather than strings.
However, it is possible to wrap the procedure in something that
takes a string as input, converts the string to a list, calls
opposite_strand, and converts the result back to a string.
Write that procedure, which we will call
paired_strand.
In your reading of Exploring Bioinformatics, you should have encountered a Perl version of this algorithm.
#!/usr/bin/perl
# paired_strand.pl
# A simple program that asks for a strand of DNA and outputs the
# paired strand.
#
# Taken from /Exploring Bioinformatics: A Project-Based Approach/
# by Caroline St. Clair and Jonathan Visick.
#
# Transcribed and commented by Samuel A. Rebelsky
print "Enter single DNA strand: ";
$dnaseq = <STDIN>;
chomp $dnaseq;
print "\nOpposite strand: ";
for ($i = 0; $i < length($dnaseq); $i++) {
$nucleo = substr($dnaseq,$i,1);
if ($nucleo eq "A") { print "T"; }
elsif ($nucleo eq "C") { print "G"; }
elsif ($nucleo eq "G") { print "C"; }
else { print "A"; }
} # for
print "\n";
a. Save a copy of this file in paired_strand.pl by right-clicking
on the link and choosing .
b. To run this Perl program, open a terminal window and type
perl paired_strand.pl
Verify that the procedure works as advertised.
c. In terms of the readability of the code, what advantages do you see for Python? What advantages do you see for Perl?
d. In terms of the way in which we interact with the code, what advantages do you see for Python? What advantages do you see for Perl?
e. Are there any other differences you observe between Python and Perl?
In observing problems with the Python functions base_pair and
opposite_strand, you may have noted one fairly significant
problem: If one represents the nucleotides using lowercase letters,
they get converted to the unknown
character (*)
rather than to the appropriate nucleotide.
a. Rewrite base_pair so that it handles lowercase as well as
uppercase letters.
Hint: While one way to rewrite base_pair is to add
four more tests, there's a helpful string procedure that can simplify
your code.
b. Verify that once base_pair is rewritten,
opposite_strand and paired_strand handle both
lowercase and uppercase letters.
-->
As St. Clair and Visick mention, in Perl, the straightfoward
for loop is only one way of computing the opposite
strand. As you've already discovered in Python, there are many
ways to solve similar problems. Let's explore a few.
a. The string.replace procedure provides a useful way to build
new strings from old. Rewrite paired_strand so that it
computes the complementary strand by using one or more calls to
string.replace.
b. In essence, the for loop in opposite_strand
is a mechanism for applying base_pair to each element of
the sequence. Those of you with some programming experience may know that
map is the traditional function for applying a procedure
to each element of the sequence. It has the form
map function list
Rewrite opposite_strand using map rather
than the for loop.
c. One difficulty with the string.replace procedure is that
it only replaces one character (although it will do many copies of that
one character). In paired_strand, we really want
to convert a variety of characters. Is there help? Yes! The
string.translate procedure is intended for just this
situation. Consider how you might rewrite paired_strand
to use string.translate.
Note: You only need to design the rewrite, not actually do it.
Note: In the standard numbering scheme, uppercase A is character 65 and lowercase a is character 97. (Letters are also numbered sequentially.)
d. Which of these four techniques do you prefer, and why?
One drawback of opposite_strand and paired_strand
is that you have to remember which representation of a strand you're using.
If you're using a list, you should call opposite_strand. If
you're using a list, you should call paired_strand. Our code
would be more robust and easier to use if we had the procedure check what
representation is being used, did the appropriate conversion (if necessary),
computed the paired/opposite strand, and then converted back (if necessary).
In Python, you can determine the type of a value with type(value). For example, we can check if strand is a string with
if (type(strand) == type('')):
Rewrite one of the procedures so that it works correctly when receiving a list as a parameter and also works correctly when receiving a string as a parameter. Also make it return the same type as it received as a parameter (so if it receives a list, it returns a list, and if it receives a string, it returns a string).
Tuesday, 1 September 2011 [Samuel A. Rebelsky]
This page was generated by
Siteweaver on Thu Sep 1 14:33:30 2011.
The source to the page was last modified on Thu Sep 1 14:33:29 2011.
This page may be found at http://www.cs.grinnell.edu/~rebelsky/ExBioPy/lab-1.4.html.
You may wish to validate this page's HTML
Samuel A. Rebelsky