CSC325 2010F, Class 15: Database Design Overview: * Database Design Basics. * Questions on the Reading. * An Exercise: A CD Database. Admin: * Reading for Friday: SQL 4 & 5. * Don't forget questions on the reading! (Preferably by 9pm on Thursday night.) * EC for this week's CS Extra on Drupal. * EC for this week's Convocation on Chemistry. * Today's office hours shifted until 2:00-4:00ish today. * Grading to be done this weekend (I hope). * Questions on Assignment 4? * Warning: The code is disorganized and has no common naming conventions So, you want to design a database (or you have to design a database) * How do you design a good database? * Process - somewhat vague * What is a good database? * Detour: Chicken and egg problem * Should we talk about design before you've had a chance to make mistakes? * Should we let you develop bad habits first? * We'll do both * Process: * Study the problem domain * The kinds of data you will need to represent * How to segment the data depending on importance * What kinds of questiosn are we likely to ask * Analyze the relationships between data * Reduce redundancies in tables * Draw pretty pictures * Prototype! (Agile methodologies can be your friend.) * Three primary kinds of relationships between data (1-1, 1-n, n-n) * One-to-one * Can draw as a line visually * Might be possible to represent as a single table rather than as two tables * But not necessarily logical connections * One-to-one relationships can change - separate tables ease changes * Our examples: Students and the Single Rooms they inhabit * Student primary key: ID * Room primary key: Building x RoomNumber * How do we store the relationship in our database? * In the student table, we add building x roomnumber columns * In the room table, we add studentid for inhabitant * Both * It's redundant. Redundancy is bad. * Can lead to inconsistency: You need to change both ends of the link. * Create a third table of Building x RoomNumber x StudentID * Creating tables for every table makes life cluttered * But seems more balanced * Lets us change from a one-to-one relationship to a one-to-many or even many-to-many * How would you decide whether to prioritize students or rooms? * Prototype rule: Do whatever is easier (adding one column takes less effort)a * Space rule: Do whatever adds less space to the database (adding one column of integers takes less space than stringxstring) * Efficiency rule: Do whatever takes less effort to match. * Analysis of problem domain: Are we more likely to have students and ask for rooms, or have rooms and ask for students? * One-to-many * Suppose many students can be in each room, but no student can be in multiple rooms * How do we represent this in our tables? * Each student stores the foreign key of its room. * Separate table * Probably a worse idea * Many-to-many * Suppose students can have multiple rooms, and rooms can have multiple students * Represent as separate table How do we decide whether a design is good? (Alternately, how do we prioritize one design over another?) * Simplicity/ease of queries * Based on preliminary analysis * What data are you likely to have for queries * (Correctness) or Likelihood of correctness of answers to queries * Efficiency: How much computing power does it take to answer the common queries? * What is likely to impact this issue? * Size of the tables (rows and columns) * Number of tables: If you have to keep drawing from different tables to answer a query, this is likely to take more effort. * Choice of type of key a * Ordering of the table (which is supposed to be hidden, but is influenced by your selection of primary key) * If you choose an automatically generated primary key, your life may be more difficult. * Database implementation: Caches, replication, and other stuff * Guarantees of consistency of database * Minimize unnecessary duplication * Checks on referential integrity * Normal form * Five normal forms