ADMIN Cool Convocation Tomorrow. Extra Credit for Attending Homework 1 due Friday! Groups for project, phase 1? Questions on project, phase 1? Bring more on Friday Missing: Erik, Mark Groups Yaw, Jonathan, Daren Adam, Emily, Nathan Aaron, Mark, Erik, (A)nanta Quiz: When is tokenizing ambiguous in Pascal? No, you can't use the answer I gave you Monday. Monday's answer: If you include the - sign in number tokens, then 3-5 has two possible interpretations with the correct one determined only by context (blech). The solution is not to include the minus sign in number tokens. REAL : [0-9]+.[0-9]+ INTEGER : [0-9]+ ELLIPSES: .. Example array[1..5] of integer Topics Grammars, Revisited Context-Free Grammars Parse Trees Context-Sensitive Grammars Ambiguous Grammars Removing Ambiguity Fun with Conditionals Reminder: Grammar is Four-Tuple T, tokens (building blocks of utterances in the language) N, nonterminals S in N, a start nonterminal Productions: Pairs First element sequence of terminals and nonterminals containing at least one nonterminal Second element is a sequence of terminals and nonterminals Grammars describe languages of sequences of tokens. Languages are built by deriving sequences of tokens from S, the start symbol. Technique: Repeatedly replace the lhs of a pair by the corresponding rhs. In practice, we can often limit the lhs of each pair to a single nonterminal. Grammars that have this form are called "Context Free Grammars". (The context is the surrounding stuff). If we don't limit the lhs, we work with "Context-Sensitive Grammars". Some nice things about Context-Free-Grammars * Some CFGs are easy to use to build parsers. * Describe most grammatical issues. * Easy to draw diagrams of derivations and parsing. Simple grammar Exp : ID Exp : NUM Exp : Exp OP Exp Exp : LPAREN Exp RPAREN ID(A) OP(*) NUM(3) OP(*) NUM(8) ...........................Exp.............................. ........................../.|.\............................. .......................Exp.OP.Exp........................... ....................../.|.\....|............................ ..................Exp..OP..Exp.NUM.......................... ...................|........|............................... ..................ID.......NUM.............................. This "parse tree" gives a nice visual "proof" that something is in the language. Parse trees are also nice to compute with * To evaluate an expression (1) If it's Exp OP Exp Evaluate left subtree Evaluate right subtree Apply OP * To translate an expression to assembly code (1) If it's Exp OP Exp Parse trees are wonderful things, so we will work hard to build them.s Detour: Do we ever need context sensitive languages? Example: Very simple programming language: Collection of declarations and assignments. Program : Declarations ENDDEC Assignments ENDPROG Declarations : Declaration TERM Declarations Declarations : Declaration : TYPE ID Assignments : Assignment TERM Assignments Assignments : Assignment : ID ASSIGN Exp Problem: You can use identifiers you haven't declared. Solution: Fix "Assignment" so that you can only use an identifier that's been declared. Note that an identifier has been declared by using the phrase "Declared ID" Declared ID0 Assignment : ID0 ASSIGN Exp Declaration : Declare ID Declare ID0 : TYPE ID0 Declared ID0 Simple example TYPE(int) ID(A) TERM ENDDEC ID(A) ASSIGN INT(5) END Program Declarations Declaration Declare ID(A) TYPE(int) ID(A) Declared ID(A) TERM Declarations ENDDEC Assignments END Whoops! Need a way to propagate the "Declared ID(A)" further in the sequence. Declared ID0 TERM | TERM Declared ID0 Declared ID0 ENDDEC | ENDDEC Declared ID0 Program -> Declarations ENDDEC Assignments END -> Declaration TERM Declarations ENDDEC Assignments END -> Declare ID(A) TERM Declarations ENDDEC Assignments END -> TYPE(int) ID(A) Declared ID(A) TERM Declarations ENDDEC Assignments END -> TYPE(int) ID(A) Declared ID(A) TERM ENDDEC Assignments END -> TYPE(int) ID(A) TERM Declared ID(A) ENDDEC Assignments END -> TYPE(int) ID(A) TERM ENDDEC Declared ID(A) Assignments END For simplicity, just look at stuff after ENDDEC Declared ID(A) Assignments END -> Declared ID(A) Assignment TERM Assignments END -> Declared ID(A) Assignment TERM END -> ID(A) ASSIGN Exp TERM END New problem: What if there are many declarations and assignments. # You don't need to use every declaration Declared ID0 Assignment : Assignment Declared ID0 # If you use a declaration in an assignment, you might # still need it in the next assignment Declared ID0 Assignment : ID0 ASSIGN Declared ID0 Exp Declared ID0 # When you reach the end of the program, you don't care what's # declared Declared ID END : END ----- FOR NEXT CLASS: Write or find a CFG for "sequences of a's and b's with equal numbers of a's and b's"