BIO/CSC295 2009F, Class 12: Sequence Assembly (1) Admin: * We will not be doing a programming project for this chapter. + We will discuss the algorithms used for shotgun assembly. * We're still debating what to do about the protein alignment project. + We may return to it on Thursday. * 10/10 is coming up. Please be responsible. Please take self-governance seriously, and be responsible for each other. * Fun interesting stuff coming up * Stovepipe is this weekend. Go! * Rent is coming up in like a month and a half * IM Soccer Tournament this weekend. Much better for you than 10/10 Overview: * Discussion of Markham et al. * Sequencing DNA * Reassembling Shotgun Sequences * Testing Assembly Algorithms Markham et al.: Patterns of HIV-1 evolution in individuals with differing rates of CD4 T cell decline * Overall impression? * Less math, perhaps easier to read * More thorough job explaining the experiment itself, making it easier to follow * Useful to have a personal perspective, having explored the sequences ourselves. * There's a benefit to having to grapple with the raw data * And then seeing what emerges from people who have enough time and resources to explore the deeper questions * What are the basic models that they are testing? * Can we look at sequence changes and figure out what is happening evolutionarily? * Model 1: "Best fit" wins, with some variance late * Allele conveys a selective advantage, so they "outcompete everyone else" * Still see variance because mutation is always happening * Examples of bacteria and fruitflies and ... * Model 2: Frequency-dependent selection; immune system acts on most prevalent virus, shows diverse population and shift in that diverse population * The immune system can't get everyone, so it gets the most abundant thing first. (No, the immune system is not sentient; we're describing that apparant response) * Model 3: Broad response (immune system is indiscriminate): Controls diversity, most populous variants survive * How do they measure this stuff? * 15 patients/participants; HIV-positive; IV-drug-users * Researchers are likely to have pre-data - most were not positive before the first visit * Tracked over twice-yearly visits for four years * Gives you the opportunity to look closely at population. * What do they do with these participants? * Look at CD4 T cells which show you how effectively the virus it working. * Peripheral blood mononuclear cells (PBMC) * Likely to have a prponderance of viral DNA. * Gives an accurate snapshot of viral population. * Q: Why deal with unstable DNA: "Such DNA has not integrated into the host genome, is unstable, and is capable of persisting only for several day sin unactivated T cells (310). * Do you get a bunch of recently replicated stuff that biases your sample? * How do we take snapshots? * Nested PCR * Subclone into a plasmid * Sequence * PCR * Start with viral template * Use temperature to separate two strangs * Bind primer * Replicate * Do it again! (Exponential blowup!) * How those cool folks on CSI do everything * Nested PCR * Second primer that is a bit inside of the first * Increases specificity * Why not sequence immediately after PCR? * You'll get some mix * Hard to interpret * The subcloning lets us pick individual pieces * Criticism: This technique misses mutants whose changes are in the primer regions * What's the problem with using the plasmids? * You're not sure that you get a representative sample * So you need to do a lot * Criticism: Sample bias! * What do they do with this sequence data? * Use the MEGA algorithm to produce a phylogenetic tree (chapter 8) * Trying to measure diversity and divergence * Diversity: Mean # of nucleotide diff. between any two clones (something *you* could measures) * Divergence: From the first visit, try to figure out infecting virus, and then measure divergence from that ancenstral virus * Nucleotides/clone that differ from ancenstral sequence * Are they the same? * Diversity seems a bit like breadth of the tree and divergence more like the depth * Technique: Synonomous/non-synonomous mutations: dS/dN * Synonomous mutation: Although the codon changes, the amino acid does not change (typically a change in the third base in the codon) * Non-synonomous - changes the amino acid * Random mutations: Non-synonomous mutations are more likely * (Gross simplifications: If the third base was irrelevant, we'd have only 1/3 synonomous mutations. But it's only irrelevant in a few cases.) * But survival rates of non-synonomous mutations are lower * The bias tells us something about how selection is acting * "They did all this stuff, what did they figure out?" * "Hey, let's look at figure 1" * Correlate CD4 T cell count with divergence and diversity * Broke into three groups: Rapid progressors, Moderate progressors, and non-progressors * Non-progressors: Count stayed about the same * Rapid progressors: Quick decrease in count * Moderate progressors: Somewhere in between * The data appear mixed * Further analysis: Is there a trend and is it staistically * "Hey, quantitative people, let's look at table 1 and figure 2" * Rapid progressors clearly have significantly higher change in diversity and change in divergence and lower dS/dN * It all lends support to model 2 * Criticism: There may be more complex correlations * Criticism: Their classification into the three groups is likely to have a big effect Sequence divergence in HIV from infection to aids is similar to the amount of divergence in the human population in 1.6 million years of evolution. (Homo erectus to Homo sapien sapien) Ch. 5 how were the DNA seq repositories created? Sanger sequencing 300-1000 bp primer + enzyme : make a copy of the strand Sanger - Add in dideo-thingys * Now, the replication stops at various points, so you have copies of various subsequences * Add radioactive tags (or dyes) to the dideoxynucleotide * Sanger used radioactive tags, so you had to do four runs (one for each nucleotide) * We do dyes now, a different one per nucleotide, so we can simultaneously do each nucleotide * If you run the sequences on a gel, you can process them one at a time