BIO/CSC295 2011F, Class 16: Structure Prediction (1) Overview: * Lab preview! * Protein structure: What is it and why care? * The Chou-Fasman Algorithm * Wet Lab! Admin: * RIP John McCarthy, inventor of LISP. * About the mid-semester evalautions. * Don't forget that we have lots of lab this week and next. (In particular, Thursday and next Tuesday are going to be long classes.) * We plan to return outstanding work NEXT week. * We're playing with the schedule a bit. The following deadlines are subject to revision. * If you have not read Chapter 7 yet, you should read it asap. * Due November 1: Web Exploration from Chapter 7. (Further details forthcoming tonight.) * Due November 3: Programming Exploration from Chapter 7. (Further details forthcoming tonight.) * Due November 8: Response to Krings et al. (Distributed Thursday.) * Due November 10: Project Proposals. (Further details forthcoming in class on November 3.) * EC (and free food) for cooking for ISO Food Bizarre * Form will be distributed via email * EC for attending ISO Food Bazaar Nov. 12 * EC for any of the Grinnell College Young Innovator for Social Justice events (Meet and greet 4:15 today; Awards ceremony Tuesday night; Coffee break on Wednesday at 2:30; Symposium on Wednesday at 4:15; Symposium Wednesday at 8:00 p.m.; Morris Dees Convo). * EC for Wednesday's noontime Biology Seminar, Stuart Allison from Knox on "Ecological restoration and environmental change - a geographic comparison of attitudes towards restoration and the practice of restoration.". * EC for Wednesday's afternoon Biology Seminar at 4:15 on "One year Master's programmes in Environmental & Life Sciences at the only 100% graduate university in England" and "'Environmental Futures' A Summer program designed for students completing their junior year at Midstates Math & Science Consortium Colleges." * No EC for this week's Thursday extra (Discrete Structures and Math Requirements in CS). * Next Tuesday at noon: Study abroad options in Budapest in CS. --- Wet Labs! Yay! * Handout: Slightly revised version of PCR lab that we've been working on. * Excellent building design makes it impossible to hear downstairs, so we're going over it. * Digest -> Ligation -> PCR (NOW!) * Nested PCR * Set up today * Tricky thing * You used a specific enzyme; so you must use the correct corresponding primer set. * Abusive thing * You'll need to come in TOMORROW (Wednesday) and set up the second set of reactions * And once again, use the correct corresponding primer set. * It's okay to ask a friend in your group to take your place * We'll have things in boxes for different primers to make identification easier. * Ms. Bosse will add the master mix. * Thursday: Run gels, then TA clone * MAY RUN LATE because we have to wait for gels to finish before TA cloning. * We're doing this quickly because every day the quality of the stuff degrades rapidly * Vida and Ms. Bosse can help you if you have questions. Professor Rebelsky doesn't even know what TA cloning is. Protein Structure: What is it and why do we care? * Proteins fold into three-dimensional structures * The three-dimensional structure affects what it does + Interact with other proteins, etc. * Four levels * Primary: Sequence of amino acids * Secondary: Alpha helices and beta sheets * Two characteristic folding elements within a protein * Tertiary: Full 3D fold * Quarternary: Interactiosn between different peptides * Proteins function as complexes * Dimers, tetramers, etc. * How easy is to get from primary to tertiary or from primary to secondary? * Many proteins fold spontaneously, particularly for the formation of alpha helices * Others require chaperones * Alpha helices * Alpha helices: As they come out of the protein machinery, they fold essentially immediately * Very stable, with low energy states, with hydrogen bonds between amino group and carboxyl group amino carboxyl H N -- C -- COOH 3 | R * R groups have specific chemical properties, such as charge or hydrophobicity * What are beta sheets? * Two strands with hydrogen bonds * May be parallel or anti-parallel * Parallel ----------------------> ----------------------> * Anti-parallel ----------------------> <---------------------- * Can be separated by LOTS of other protein sequence * What determines whether we get alpha helices or beta sheets or other? * The particular amino acids, and their order So, can we figure out the secondary structure from the primary structure? Can we figure out the tertiary structure from the primary structure? * The game FoldIt looks at this * Rules talk about how proteins fold * EC for playing game? We'll consider it. * You might think if we know the primary sequence, you can figure out how it will fold? * Interactions can be between things that are very very far apart E.g., amino acid 42 and amino acid 421 can affect each other * We aren't even great at predicting alpha helices, and they are stable structures that we understand Detour: How do you get protein structure if you can't get it from primary sequence? * X-ray crystallograpy * Crystalize the protein * Hard to do, particularly for membrane proteins * Solve the crystal structures * Hard to do * All in all, it is time consuming Back to the game * Goal is to find lowest energy folds for the structure * Sample rules * "Small is good" - Typically tight enough to exclude water * "Bury hydrophobic residues" - At least in an aqueous environment * "Don't violate charge rules" * Those few rules are not enough to determine folding * /Nature/ paper: Solved an HIV structure * Game gave approximate solution * Solution verified experimentally * Should we write algorithms that mimic the gamer strategies? * Lots of promise if we can do this sort of stuff ll Chou Fasman How do we go from amino acid sequence to structure? Chou and Fasman approach Statistical data - can help us predict secondary structure early 1970's - had sequence data and crystal structure Made a table - for each amino acid - number of times AA appeared in the data set How often "internal" in an alpha helix? How often in beta sheet? How often in a turn?? For the turn - initial amino acid, second in turn, third in turn? Lots of counting.... Ex 60 total 20 in alpha 10 in beta etc Turn these into parameters we can use to apply to unknown proteins Turn to table on page 217 of text. 2 step process Frequency - normalize to appearance 120 A's 800 R's etc for each amino acid 80/120 A is in alpha helix 80/800 R is in alpha helix etc for whole table normalize across all amino acids 12000 amino acids total 6000 appear in alpha helices... what is the likelihood that a random amino acid will appear? 50% Does a specific amino acid occur more/less freq. than average? get ratio - So A's more likely in alpha helix Once you have statistical data - you can use it! algorithm. to find alpha helix look at 6 amino acid frames - Is this a candidate? if 4 of 6 amino acids have a higher than normal prob. to be alpha helical 1.42 (142) Expand the frame until we have 4 that are unlikely to be alpha-helical Assess result - length - really short is unlikely Needs to be at least 5 Look at average P(alpha)- across region must be >103 as a score. Add scores 142 + 98 + 142 + 142 + 67 = get overall average; likely Also assess beta sheet scores You can change some of the parameters For turns - use basic stuff; where a specific amino acid is more likely to occur in a turn. How do we know if this algorithm is good? Test the code. How do we know works well? Compare with experimental data; test with training set (as a minimum!) then test with novel proteins and see if it works. two questions - Does it identify a real alpha helix? If you have a real alpha helix, does it find it. If it calls is an alpha helix, is it really an alpha helix. False positive/negative. On training data - ~60%; Untrained is ~40%. If you don't use chou fasman, what do you do? Do a physical simulation... Computationally complex!!! Games Neural networks (MAGIC!) formulas - I have a couple of inputs; manipulate, etc; weight differently; etc; and you get an answer; Does it give a correct answer? Change weights up/down until you get answer PLUG in data and not know how it works. Other indicators of helix N-terminal capping : Glu; Asp, Pro C-terminal capping : His, Lys, Gln, Arg Gly and Pro are rarely found within center of helix. Pay attention to your moment of obligation! Most successful companies work in pairs; not enough social justice organizations do that