BIO/CSC295 2011F, Class 26: Time for work on projects Overview: * Guest lecture: Paul Tymann. * Discussion of presentation evaluations. * Time to work on projects. Admin: * Sam will distribute a checklist for EC next week. * Kellis papers finally returned. See email with general notes. * Grade overviews coming soon. * Upcoming work: Presentations Tuesday; Reports Thursday; Exam 2 p.m. Wednesday of Finals week; Portfolios 2 p.m. Wednesday of Finals week. * What food do you want for presentations? Brownies, Oreos, Hot Chocolate with Candy Canes (Please) and mini mashmallows * Alternative final times: Thursday at 9 a.m. and 2 p.m. You must fill out a change of final time form to take the final at an alternate time. * After our guest lecture, we will discuss the evaluation checklist for presentations. * EC for Swim Meet (Friday 6-9, Saturday all day). (two hours is enough) * EC for a Sub-free Waltz. * EC for Steganography CS extra today at 4:30 (snacks at 4:15). * EC for Friday's Bio seminar Grinnell Alum 'xx on Radiation Oncology and MD Ph.D. programs. FREE FOOD IF YOU TALK TO HIM ABOUT THE MD PHD PROGRAM AFTERWARDS. MOSACB * EC for Band concert, Saturday, 3pm Herrick. * EC for Singers Messiah concert, Sunday, 2pm, Herrick? Paul Tymann on Building a Genetic Monitoring Database * Background about PT: Started college as a bio major, thought he could be a vet. Second year of college "I can't be in school for the rest of my life." So, went on to CS, which seemed practical. But then ended up being in school the rest of his life. * A few years after he started at RIT, they were starting a bioinformatics program, and he was enthusiastic about being a part. * Today "bread and butter stuff that needs to be done in the lab." * Software developed for labs; not so much algorithmic as managing data and workflow. * Not as sexy as dynamic programming, but absolutely necessary for a production bioinformatics lab. * The domain for this work: "A working lab - researchers send samples to get sequenced"; the lab does not do the analysis. * The lab also supports Harlan, which provides animals (rats and mice) for studies. These animals are supposed to have very specific genetic traits. Need to check for pure strains - want to make sure that there's no cross-contimination. * Microsatellites used for screening * Short non-coding regions with a short sequence repeated many times * The length is strain-specific (and even individual-specific) * If you measure the length of a microsatellite * Microsatellite screening: * Customer specifies species, strain, markers * 10-40 markers on about 1K animals; 10K to 40K results * These are small numbers compared to some things you study * Detour: There is an avalance of data being generated. Good stuff to analyze. But it's not contexualized well; there's no or little meta-data. * Pre-PT: Lots of stuff done by hand. * PT's statement: "I think I can automate this." * There's lots of data, there are lots of sexy algorithms, but in the end, we need systems that help scientists structure and arrange things so that they can understand them. * Plates: Lots of wells (96 = 8x12; 384 = 16x24) * Robots fill out well * Used gel electrophoresis to analyze the size of the microsatellites * AMS90 robot would take the plates and run the gels. * But the metadata (what sample is associated with which data) are not available. * Need to map from 384 plates to 96 plates * Also need to map the relationship between extraction plates and PCR plates. * The plate stamper would load into the database the relationships between extraction plates and PCR plates. * Changing that to automatic rather than by hand * Important task: Parsing text file for data and put it in the database. * ... * Lots and lots of data. E.g., the Fluidigm BioMark gives 96 pieces of data per sample, and we have the good old 96 or 384 samples * Can also repurpose the data: E.g., for simple gender calls. * Conclusions * Management of the lab data is just as important as algorithms * Not as sexy, but very important * In biology there are no constants * Every time the novices think they discover a rule * Use COTS tools whenever possible (COTS = Components Off-the-Shelf) * CS folks need to understand enough of the lingo that you can talk to biologists. * Biology folks need to understand enough about CS that they can talk to a computer scientist and realize that not everything is computable. * "Understanding each other is the most impotant thing." * Question: How do you validate your results? (Or do you worry about that?) Final exam questions: * What was on PT's last slide? * What was on JG's last slide? (No, not really.) Proposal Evaluations * Everyone gets to evaluate everyone else's presentation. * Here's the form we've come up with for evaluation. Let us know if you think it should be changed. These will go to the groups, and groups will then turn them in to us with their final reports. + What is the group’s primary hypothesis or claim? + What were the primary biological aspects of the project? + What were the primary computational aspects of the project? + What do you see as the greatest strengths of the presentation? + What do you see as the greatest weaknesses of the presentation? + Give at least one recommendation that will help the group write a better report. + Add anything else you think the group might find helpful. * Note: Feedback is to help you, but you can do what you want with it, including ignoring it or using it as a dartboard. * You get what you pay for. * Focus on substance. (Sam will focus on style.) * Vida's favorite analogy: A good paper is like a miniskirt (gender-neutral) long enough to cover the subject, short enough to keep it interesting.