Exploring Bioinformatics with Python

Project 7.5: Exploring Structure Prediction with the Chou-Fasman Algorithm

Note: You will be expected to turn in the final version of your algorithm for assessment.

0. Make your own copies of the following files:

1. Your first goal is to understand what parts of the Chou-Fasman algorithm are already implemented, and how you can use them.

You may want to call each of the procedures to better understand its purpose. For example, you can call CF_find_alpha or ChouFasman on the sequence you get from NP_005408.

>>> ChouFasman(read_fasta('NP_005408-fasta.txt')[1])

2. Develop a set of test sequences that you think will be useful. Your set should include.

3. The code to extend a potential alpha helix (step 1b on p. 218) is not yet written. Write that code.

Note: As our book notes, one difficulty with extending a region is that you may hit the beginning or end of the sequence. Be careful about those situations.

4. The code for checking whether a range is likely to be an alpha helix (step 1c on p. 218) is incomplete. Complete that code.

5. As written, CF_find_alpha does some unneccessary checking, and therefore finds duplicate regions. In particular, once it has identified a potential alpha helix in the range (X,Y), it starts again near X+1. However, it need not look for the next alpha helix before position Y+1. Update your code so that the search is more efficient.

6. There is not yet a procedure to find beta strands. Implement that procedure. (You will find that it is very similar to the one for finding alpha helices.)

7. There is not yet a procedure to find beta turns. Implement that procedure. (You will find that this procedure is a bit different, because it does not expand the region, because the contribution of an amino acid to turn probability depends on its position in the region, and because turns are just one unit long.)

8. The ChouFasman procedure currently fails to do step 4 of the algorithm (finding and handling overlaps, p. 219). Implement that portion of the algorithm.

9. Implement any other pieces you consider necessary for the full algorithm.

Part 2: Experimentally Analyze the Chou-Fasman Algorithm

Pick three proteins for which there is a known structure. For each protein, run your version of the Chou-Fasman algorithm and analyze how well (or poorly) your algorithm analyzed the protein.

For example, here are some basic analyses related to alpha helices.

You might also explore how well your algorithm did as compared to PSIPRED.

Part 3: Improve the Algorithm

Explore the literature about Chou-Fasman and describe three possible improvements to the algorithm. (You need not implement these improvements. However, you should describe them at a level that your colleagues in the class could understand the improvements.)

Optionally: Implement one of these improvements and analyze the effect it has on the algorithm (does it really make it better).

What to Turn In


This page was generated by Siteweaver on Thu Oct 27 12:49:06 2011.
The source to the page was last modified on Thu Oct 27 12:48:55 2011.
This page may be found at http://www.cs.grinnell.edu/~rebelsky/ExBioPy/project-7.5.html.

You may wish to validate this page's HTML

Samuel A. Rebelsky
rebelsky@grinnell.edu