Skip to content

From Tokens to Parse Trees

The process of finding the structure in the flat stream of tokens is called parsing, and the module that performs this task is called parser.

Two well-known ways to parse.

  1. top-down
  2. Left-scan, Leftmost derivation (LL)
    • Constructs the parse tree in pre-order
  3. bottom-up
  4. Left-scan, Rightmost derivation in reverse (LR)
    • Constructs the parse tree in post-order

Example Parsing of Micro-English:

1550696507679

Top-down Parsers

Tree is grown from the root (top)

1550696563891

Corresponds to a left derivation

1550696684204

Bottom-up parser

Tree grows from the leaves (bottom) up to the root (top).

Just read right derivations backwards. (Rightmost derivation in reverse)

1550696827666

Top-Down vs. Bottom-Up parsing

1550696910244

Hierachy

1550588097454

Formal Definition of LL(1)

Properties of the grammar that determines if it is LL(1) or not:

A grammar G is LL(1) if for each set of productions M::=X_1|X_2|...|X_n:

  1. \text{first}[X_1], \text{first}[X_2],...,\text{first}[X_n] are all pairwise disjoint
  2. If X_i\Rightarrow^*\lambda then \text{first}[X_j]\cap \text{follow}[X]=Ø, for 1\leq j\leq n. i\neq j

If G is \lambda-free then (1) is sufficient

  • An LL(1) grammar is a grammar which can be parsed with a top-down parser with a lookahead of one token.

First Sets

The set of all terminal symbols that can begin a sentential form derivable from the string \alpha

First(\alpha)=\{a\in\Sigma \mid \alpha \Rightarrow^*a\beta\}

We never include \lambda in First(\alpha) even if \alpha \Rightarrow \lambda

Example

1550698029127

1
2
3
First(Tail)     = { + }
First(Prefix)   = { f }
First(E)        = { v, f, ( }

Algorithm for Computing First-Set

1550698206310

Follow Sets

The set of terminals that can follow a nonterminal A in some sentential form.

​ For A\in N

Follow(A)=\{b\in \Sigma \mid S\Rightarrow^+ \alpha A b \beta\}

The right context associated with A

Example

Using the same example as in First Sets

1
2
3
Follow(Tail)    = { ) }
Follow(Prefix)  = { ( }
Follow(E)       = { $, ) }

Algorithm for Computing Follow(A)

1550698650834

Provable Facts About LL(1) Grammars

  • No left-recursive grammar is LL(1)
  • No ambiguous grammar is LL(1)
  • Some languages have no LL(1) grammar
  • A \lambda-free grammar, where each alternative X_j for N::=X_j begins with a distinct terminal, is a simple LL(1) grammar

LR Grammars

A Grammar is an LR Grammar if it can be parsed by a LR parsing algorithm.

Harder to implement LR than LL parsers.

  • Tools exists though (JavaCUP, Yacc, C#CUP, SableCC)

Can recognize LR(0), LR(1), SLR, LALR grammars.

  • Bigger class of grammars than LL

  • Can handle left recursion!

  • Usually more convenient because less need to rewrite grammar

Most commonly used for automatic tools today (LALR in particular)

Tools for designing CFG's

Udvid

Kig på predictset! Side 176 i Fisher et. al. PDF


Last update: June 16, 2019