BIO/CSC295 2009F, Class 20: Phylogenetics (1) Admin: * Food * Don't forget the Human Rights and the Environment seminar this week. * Your exams are not yet graded. It's Sam's fault. * We hope our sick students get better soon. * Cross-country regional meet this weekend at Oakland Acres * Ride a bus from the PEC at 9am * Swim meet this Friday at 5pm * EC for giving Vida four tickets to the Int'l student dinner * How does one report EC? * At the end of the semester, you get a list of all the opportnities * You check off the ones you've done * And correcting my typing doesn't count Overview: * Why build phylogentic trees? * Bioinformatic approaches. * Some challenges. * Primary literature discussion: Neandertals. * Web exploration (if time) We are discussing chapter 8. (Only one chapter to go. We should have made you buy a longer book.) * What's the point? * Phylogenetic trees and measures of evolutions * We've certainly talked a bit about them before * E.g., w.r.t. the HIV stuff * We'd hope you had a "Ah hah! Now I have a deeper understanding of how this stuff is done." * Two classes of phylogenetic analysis * Cladists: Measure based on biological similarity * Pheneticists: Measure based on statistical similarity * Strategy: Measure rates of change in sequences * Need the sequences to be common across species * E.g., Hippos vs. Whales using milk protein * Phenotypes: Changes in wild population * But we don't trust phenotypes without molecular data * Problem/issue: We're using snapshots for today to infer things about the past. * Was it obvious to you that hippos and whales are closely related * How do you choose a sequence for measuring rate of change * Needs to mutate fast enough that you'll see changes * But not so fast that you won't be able to see similarities or to infer evolutionary history. * So your choice of the sequence/gene will depend on what you're comparing * Two similar species: You want something that mutates relatively quickly * Two dissiimilar species: You want something that's relatively conserved [DETOUR] * What is a species? * Our definition is as inexact as our definition of gene * Classic: "Cross has to produce a viable hybrid" * But it's hugely complicated. + Mastiff + Chihuahua are both dogs + Isolated populatations that develop very different characteristics, but can still cross-breed. + And we see gene transfer across species of bacteria * James Watt says "Polar bears and grizzly bears are genetically similar, so there's no reason to have to worry about preserving polar bears". * As we're analyzing relationships, we need to think carefully about what sequences we use as our "molecular clock" [DETOUR] * What is the relationship between genotypic similarity and phenotypic similarity? * Sometimes there is no relationship. * Sharks and Dolphins * Wings on flying mammals vs. wings on birds * Vida recommends that you read "The Blind Watchmaker" [DETOUR FROM THE DETOUR] * How do you answer the question of "How do really complex forms, like wings, evolve?" * You can get complex forms from slight gene changes: E.g., change in number of wings of fruit flies. * You can build up a sequence of small advantages: E.g., gliding will help you go faster. * "Talk to Professors Brown and Praitis." Let's consider some algorithmic approaches to building P. trees input: different sequences (molecular clock: similar but not identical); balance that they are not too far apart but not too close together. Output: a tree that shows similarity/relationship (Likely evolution) two ways: simple trees or...more information: how close are the branches... assess closeness statistically - Different statistic - set up relationships; measure statistics of how often they return the same result As a biologist: how long ago did they have a common ancestor? Start: this is a likely tree What tools might we use for the algorithms we've already written: align sequences as a start compare and score them - algorithm can score how good the alignment What else sounds relevant? tree: sp. 1; also a tree sp 2 and 1; also a tree sp. 1,2,3; etc Build groups of groups We can do a list of lists in Python Start by finding the highest scoring match. Group them In our tree, s1 and s3 are most closely related Do it again and again until you are down to 1 group Do you build a consensus sequence for the group? Table: Distance s1 s2 s3 s4 s1 0 s2 4 0 s3 5 6 0 s4 8 8 4 0 Second round, group s1, s2 then repeat What do we do with this grouping? higher order makes sense But, new problem: we have a metric for determining closeness of two sequences We need a metric for determining closeness of "sets" of sequences (metric first - count mismatches...) technique 1: build a consensus sequence... recall in seq comparisons, there are letters that indicate either g or c Technique 2; average them Pick a method for this step; heuristic What is the metric for determining closeness? We will talk when we do the computational work. Problems: 1. evolution isn't constant Problems: bases can mutate and mutate back Problems: we don't have the ancestor (a lot of approximation) How do we measure distances?? some notion of mutation rate: consider expectedand take into account both directions rates are not constant Campus Climate is Messy