Exploring Bioinformatics with Python

Followup Exercises for Section 1.4: Fun with Paired DNA Strands

Preparation

Start the Wing 101 IDE.

Exercise 1: Starting with Individual Nucleotides

As you may recall from the Python version of the base-pair algorithm, we start by working with individual nucleotides. Here is the function we developed for this conversion.


def base_pair(nucleotide):
    """Convert a nucleotide (represented as a one-character string) to 
       the paired nucleotide."""
    if nucleotide == 'A':
        return 'T'
    elif nucleotide == 'C':
        return 'G'
    elif nucleotide == 'G':
        return 'C'
    elif nucleotide == 'T':
        return 'A'
    else:
        return '*'

a. Create a new file in Wing 101 and copy this code to the file.

b. Save the file.

c. Click the run/play button (the green triangle) to load the file.

d. Verify that the procedure works as advertised by translating a variety of nucleotides. For example,

>>> base_pair('A')
'T'

e. Make notes of any flaws you observe in base_pair.

Exercise 2: Typical Mistakes

Even the best of programmers make mistakes when designing algorithms and when translating those algorithms to code. Let's explore what Python does if we make a few common mistakes in entering code.

a. The line that defines a function is supposed to end with a colon. Determine what happens if you remove that colon. That is, remove the colon at the end of base_pair(nucleotide):, save the modified file, and then click the run button again. After determining what Python does, repair the error.

b. The double-equals sign, == is used for comparison. The single equals sign =, is used for assignment. Try using the single equals sign in the first test. That is, replace

    if nucleotide == 'A':

with

    if nucleotide = 'A':

Once again, save the modified file and reload it. Next, see if you get an error during loading and, if not, see what the modified code does on a variety of nucleotides.

Once testing the modification, restore the correct comparison operation.

c. As you may have noticed, Python ends every control structure with a colon. See what happens if you remove the colon from

    if nucleotide == 'A':

d. As you may recall from the corresponding discussion, we include the last two lines of base_pair for the case that someone fails to include a valid nucleotide.

    else:
        return '*'

Determine by experimentation what happens if you remove those two lines.

e. By custom, programmers use indentation to show the structure of their programs. Python differs from many languages by using the indentation of code to understand the programmer's intended meaning.

Determine what happens if you remove the extra indentation before the return statement in

        return 'T'

f. The base_pair function is fairly straightforward in its use of control structures. Consider the following two procedures:


def bar(x):
    """A sample procedure."""
    if (x > 10):
        x = x - 10
        x = x / 2
    return x


def baz(x):
    """A sample procedure."""
    if (x > 10):
        x = x - 10
    x = x / 2
    return x

Note that the only differences between these two procedures are their names and their indentation. Identify some values for which the procedures behave differently.

Exercise 3: Finding the Opposite Strand

The base_pair procedure converts nucleotides to the paired nucleotide. Of course, our goal is to convert whole strands of DNA, rather than individual nucleotides. Hence, we build upon base_pair to create opposite_strand.


def opposite_strand(strand):
    """Given a single DNA strand, reprented as a sequence, compute the 
       opposite strand."""
    result = []
    for nucleotide in strand:
        result.append(base_pair(nucleotide))
    return result

a. Copy this code to your file.

b. Run the updated file.

c. Verify that the procedure works as advertised by translating a variety of nucleotides. For example,

>>> opposite_strand(['A', 'C', 'T', 'T', 'U', 'G'])
['T', 'G', 'A', 'A', '*', 'C']

d. Observe any potential problems you observe in opposite_strand.

Exercise 4: Converting Strings

As we noted in our initial explorations with sequences and strings, it is often much more convenient to represent a sequence as a string than as a list. It is, however, impossible to modify a string and it is time-consuming to extend a string. Because of these difficulties, we wrote opposite_strand to use lists, rather than strings.

However, it is possible to wrap the procedure in something that takes a string as input, converts the string to a list, calls opposite_strand, and converts the result back to a string.

Write that procedure, which we will call paired_strand.

Exercise 5: Python vs. Perl

In your reading of Exploring Bioinformatics, you should have encountered a Perl version of this algorithm.

#!/usr/bin/perl

# paired_strand.pl
#  A simple program that asks for a strand of DNA and outputs the
#  paired strand.
#
#  Taken from /Exploring Bioinformatics: A Project-Based Approach/
#  by Caroline St. Clair and Jonathan Visick.  
#
#  Transcribed and commented by Samuel A. Rebelsky
print "Enter single DNA strand: ";
$dnaseq = <STDIN>;
chomp $dnaseq;
print "\nOpposite strand: ";
for ($i = 0; $i < length($dnaseq); $i++) {
  $nucleo = substr($dnaseq,$i,1);
  if ($nucleo eq "A") { print "T"; }
  elsif ($nucleo eq "C") { print "G"; }
  elsif ($nucleo eq "G") { print "C"; }
  else { print "A"; }
} # for
print "\n";

a. Save a copy of this file in paired_strand.pl by right-clicking on the link and choosing Save Link As ....

b. To run this Perl program, open a terminal window and type

perl paired_strand.pl

Verify that the procedure works as advertised.

c. In terms of the readability of the code, what advantages do you see for Python? What advantages do you see for Perl?

d. In terms of the way in which we interact with the code, what advantages do you see for Python? What advantages do you see for Perl?

e. Are there any other differences you observe between Python and Perl?

Exercise 6: The Lowercase Problem

In observing problems with the Python functions base_pair and opposite_strand, you may have noted one fairly significant problem: If one represents the nucleotides using lowercase letters, they get converted to the unknown character (*) rather than to the appropriate nucleotide.

a. Rewrite base_pair so that it handles lowercase as well as uppercase letters.

Hint: While one way to rewrite base_pair is to add four more tests, there's a helpful string procedure that can simplify your code.

b. Verify that once base_pair is rewritten, opposite_strand and paired_strand handle both lowercase and uppercase letters.

-->

Exercise 7: Alternate Techniques

As St. Clair and Visick mention, in Perl, the straightfoward for loop is only one way of computing the opposite strand. As you've already discovered in Python, there are many ways to solve similar problems. Let's explore a few.

a. The string.replace procedure provides a useful way to build new strings from old. Rewrite paired_strand so that it computes the complementary strand by using one or more calls to string.replace.

b. In essence, the for loop in opposite_strand is a mechanism for applying base_pair to each element of the sequence. Those of you with some programming experience may know that map is the traditional function for applying a procedure to each element of the sequence. It has the form

map function list

Rewrite opposite_strand using map rather than the for loop.

c. One difficulty with the string.replace procedure is that it only replaces one character (although it will do many copies of that one character). In paired_strand, we really want to convert a variety of characters. Is there help? Yes! The string.translate procedure is intended for just this situation. Consider how you might rewrite paired_strand to use string.translate.

Note: You only need to design the rewrite, not actually do it.

Note: In the standard numbering scheme, uppercase A is character 65 and lowercase a is character 97. (Letters are also numbered sequentially.)

d. Which of these four techniques do you prefer, and why?

Exercise 8: Robustness and Polymorphism

One drawback of opposite_strand and paired_strand is that you have to remember which representation of a strand you're using. If you're using a list, you should call opposite_strand. If you're using a list, you should call paired_strand. Our code would be more robust and easier to use if we had the procedure check what representation is being used, did the appropriate conversion (if necessary), computed the paired/opposite strand, and then converted back (if necessary).

In Python, you can determine the type of a value with type(value). For example, we can check if strand is a string with

    if (type(strand) == type('')):

Rewrite one of the procedures so that it works correctly when receiving a list as a parameter and also works correctly when receiving a string as a parameter. Also make it return the same type as it received as a parameter (so if it receives a list, it returns a list, and if it receives a string, it returns a string).

History

Tuesday, 1 September 2011 [Samuel A. Rebelsky]


This page was generated by Siteweaver on Thu Sep 1 14:33:30 2011.
The source to the page was last modified on Thu Sep 1 14:33:29 2011.
This page may be found at http://www.cs.grinnell.edu/~rebelsky/ExBioPy/lab-1.4.html.

You may wish to validate this page's HTML

Samuel A. Rebelsky
rebelsky@grinnell.edu