Exploring Bioinformatics with Python
Basic:
[Skip To Body]
[Front Door]
|
[Reference]
[Labs]
[Projects]
Courses:
[BIO/CSC295.01 2009F]
[BIO/CSC295.01 2011F]
Python:
[python.org]
[biopython.org]
Misc:
[Exploring Bioinformatics site]
Note: You will be expected to turn in the final version of your algorithm for assessment.
0. Make your own copies of the following files:
ChouFasman.py
, which includes preliminary code for this assignment.
read_fasta.py
, the procedure we explored previously for reading FASTA data.
NP_005408-fasta.txt
, the FASTA file for human SRC.
CF_test.txt
, the test data provided by our authors.
1. Your first goal is to understand what parts of the Chou-Fasman algorithm are already implemented, and how you can use them.
ChouFasman
procedure?
for aa in aa_names
loop at the start of
the code do? (Make sure to look at the body.)
CF_find_alpha
,
CF_extend_alpha
, and CF_good_alhpa
?
You may want to call each of the procedures to better understand its purpose.
For example, you can call CF_find_alpha
or
ChouFasman
on the sequence you get from
NP_005408
.
>>> ChouFasman(read_fasta('NP_005408-fasta.txt')[1])
2. Develop a set of test sequences that you think will be useful. Your set should include.
3. The code to extend a potential alpha helix (step 1b on p. 218) is not yet written. Write that code.
Note: As our book notes, one difficulty with extending a region is that you may hit the beginning or end of the sequence. Be careful about those situations.
4. The code for checking whether a range is likely to be an alpha helix (step 1c on p. 218) is incomplete. Complete that code.
5. As written, CF_find_alpha
does some unneccessary checking,
and therefore finds duplicate regions. In particular, once it has
identified a potential alpha helix in the range (X,Y), it starts again
near X+1. However, it need not look for the next alpha helix before
position Y+1. Update your code so that the search is more efficient.
6. There is not yet a procedure to find beta strands. Implement that procedure. (You will find that it is very similar to the one for finding alpha helices.)
7. There is not yet a procedure to find beta turns. Implement that procedure. (You will find that this procedure is a bit different, because it does not expand the region, because the contribution of an amino acid to turn probability depends on its position in the region, and because turns are just one unit long.)
8. The ChouFasman
procedure currently fails to do
step 4 of the algorithm (finding and handling overlaps, p. 219). Implement
that portion of the algorithm.
9. Implement any other pieces you consider necessary for the full algorithm.
Pick three proteins for which there is a known structure. For each protein, run your version of the Chou-Fasman algorithm and analyze how well (or poorly) your algorithm analyzed the protein.
For example, here are some basic analyses related to alpha helices.
You might also explore how well your algorithm did as compared to PSIPRED.
Explore the literature
about Chou-Fasman and describe three possible
improvements to the algorithm. (You need not implement these improvements.
However, you should describe them at a level that your colleagues in the class
could understand the improvements.)
Optionally: Implement one of these improvements and analyze the effect it has on the algorithm (does it really make it better).
This page was generated by
Siteweaver on Thu Oct 27 12:49:06 2011.
The source to the page was last modified on Thu Oct 27 12:48:55 2011.
This page may be found at http://www.cs.grinnell.edu/~rebelsky/ExBioPy/project-7.5.html
.
You may wish to validate this page's HTML
Samuel A. Rebelsky