From Tokens to Parse Trees¶

The process of finding the structure in the flat stream of tokens is called parsing, and the module that performs this task is called parser.

Two well-known ways to parse.

top-down
Left-scan, Leftmost derivation (LL)
- Constructs the parse tree in pre-order
bottom-up
Left-scan, Rightmost derivation in reverse (LR)
- Constructs the parse tree in post-order

Example Parsing of Micro-English:¶

1550696507679

Top-down Parsers¶

Tree is grown from the root (top)

1550696563891

Corresponds to a left derivation

1550696684204

Bottom-up parser¶

Tree grows from the leaves (bottom) up to the root (top).

Just read right derivations backwards. (Rightmost derivation in reverse)

1550696827666

Top-Down vs. Bottom-Up parsing¶

1550696910244

Hierachy¶

1550588097454

Formal Definition of LL(1)¶

Properties of the grammar that determines if it is LL(1) or not:

A grammar G is LL(1) if for each set of productions $M::=X_1|X_2|...|X_n:$

$\text{first}[X_1], \text{first}[X_2],...,\text{first}[X_n]$ are all pairwise disjoint
If $X_i\Rightarrow^*\lambda$ then $\text{first}[X_j]\cap \text{follow}[X]=Ø$ , for $1\leq j\leq n. i\neq j$

If G is $\lambda$ -free then (1) is sufficient

An LL(1) grammar is a grammar which can be parsed with a top-down parser with a lookahead of one token.

First Sets¶

The set of all terminal symbols that can begin a sentential form derivable from the string $\alpha$

$First(\alpha)=\{a\in\Sigma \mid \alpha \Rightarrow^*a\beta\}$

We never include $\lambda$ in First( $\alpha$ ) even if $\alpha \Rightarrow \lambda$

Example¶

1550698029127

First(Tail)     = { + }
First(Prefix)   = { f }
First(E)        = { v, f, ( }

Algorithm for Computing First-Set¶

1550698206310

Follow Sets¶

The set of terminals that can follow a nonterminal A in some sentential form.

For $A\in N$

$Follow(A)=\{b\in \Sigma \mid S\Rightarrow^+ \alpha A b \beta\}$

The right context associated with A

Example¶

Using the same example as in First Sets

Follow(Tail)    = { ) }
Follow(Prefix)  = { ( }
Follow(E)       = { $, ) }

Algorithm for Computing Follow(A)¶

1550698650834

Provable Facts About LL(1) Grammars¶

No left-recursive grammar is LL(1)
No ambiguous grammar is LL(1)
Some languages have no LL(1) grammar
A $\lambda$ -free grammar, where each alternative $X_j$ for $N::=X_j$ begins with a distinct terminal, is a simple LL(1) grammar

LR Grammars¶

A Grammar is an LR Grammar if it can be parsed by a LR parsing algorithm.

Harder to implement LR than LL parsers.

Tools exists though (JavaCUP, Yacc, C#CUP, SableCC)

Can recognize LR(0), LR(1), SLR, LALR grammars.

Bigger class of grammars than LL
Can handle left recursion!
Usually more convenient because less need to rewrite grammar

Most commonly used for automatic tools today (LALR in particular)

Tools for designing CFG's¶

Udvid

Kig på predictset! Side 176 i Fisher et. al. PDF

Last update: June 16, 2019