CSC302 2007S, Class 36: SQL (2): SEQUEL Admin: * A favor - Can a few of you try http://language.grinnell.edu/names/ and let me know by early this afternoon what you think about it? * EC for alumna scholar talk by Dr. Nesbitt on Monday. * Presentations next week still require reading responses * (and therefore readings). * I need Wednesday's reading ASAP. * Sam keeps forgetting to mention: Make and Excel are also declarative languages. * Paper distributed for Monday * Good job on reflections! * SEPC? Overview: * Codd's Contributions. * Why Read This Paper? * Thinking About Relations. * Operations on Relations. * Relations as Programming Language. * Describing a Language. * What's Missing from SEQUEL? /From Last Class/ * Thesis: Database design should be based on relations * Because we should separate interface from implementation * Because relations can capture the problems we focus on in databases * Because we can formally discuss the meaning of relations and their operations /What are the Contributions of Codd's 1970 Paper/ * Theoretical groundwork * The relational model * Some key operations on relations * Formalization of this stuff * Redunancy and consitency (probably not new ideas) * Design framework * Separate interface from implementation * Including critique of existing model * Data sublanguage * Protecting users * Introduced language (maybe) * Join, Projection, ... * Normalization * The big contribution: Represent databases (abstractly) as collections of relations * And relational databases are huge. /What is a Relation? How can you think about relations?/ * Two-D Array (Table or Matrix) * Each row is an n-tuple * Rows are distinct * Ordering is unimportant * Ordering of columns is important * Columns are particular domains (types) * Set of ordered n-tuples * Relations are predicates * Given an n-tuple, x, R(x) holds if x is in R * Might want some AND, OR, NOT, etc. computations * Some relations can be thought of as functions * Domain: One of the keys * Range: Tuple (or the rest of the tuple) /What are the natural operations on relations?/ CODD * JOIN - Given two binary relations, make a ternary relation by joining together corresponding elements such that the cdr of the first relation equals the car of the second FOO BAR JOIN(FOO,BAR) JOIN(BAR,FOO) 5 1 1 1 5 1 1 NO ELEMENTS! 6 1 1 2 5 1 2 7 1 2 3 6 1 1 8 2 2 4 6 1 2 9 2 7 1 1 7 1 2 ... * More general JOIN? Given an n-ary relation NOO and an m-ary relation MAR, what is JOIN(NOO,MAR)? * One technique, join based on the last column of NOO and the first column of MAR. * Another (more widely adopted) technique: set product * PROJECTION (superset of PERMUTATION) * Select certain columns of the relation * And provide an order for those columns * Remove duplicates SEQUEL * SELECT _ FROM _ WHERE _ * SELECT _ FROM _ is "Project" * WHERE _ lets you decide whether or not to keep a particular tuple - "selects" elements ; CHOOSE * Design decision: project (SELECT) does not always remove duplicates * If we apply a function to the result of the SELECT, we don't remove duplicates - otherwise, average (or count) would be misleading * Apply arithmetic functions to the results of selects * Complex criteria for choosing rows * GROUP BY - Shoves rows into groups using particular criteria * In some sense, the opposite of normalization * FOCUS ON RETRIEVAL! UPDATE AND ACCESS AND ATOMICITY ARE SEPARATE PROBLEMSq /How do you present a new language?/ N used in the SEQUEL paper * Preparation: Describe the domain and motivation of the language * Examples * Compare examples in the new language to examples from an old language * Formal Syntax - Unambiguous * Gives you a sense of what is available in the language Other possible ones? * Last day of class?