What is LR parsing? People usually describe it as some sort of “bottom-up” parsing technique. That true, but doesn’t build intuition.

From complexity theory, we know **context-free grammars** (CFGs) are “more powerful” than *deterministic finite automatas* (DFAs). If you have ever tried (and failed at) parsing HTML with a regex, you’d see what I mean.

All languages describable by a DFA are describable by a CFG, the inverse is not true. For example, this grammar describes balanced brackets:

```
S -> E
E -> 0
E -> ( E )
```

It turns out you can’t write a DFA that can figure out if brackets are balanced or not.

Why is this relevant to us? Well, LR parsing boils down to: “some CFGs are too hard for DFAs to parse. What if we *only* consider the languages that *are* decidable by a DFA?” (DFAs all run in by the way, which is fast enough for us). It’s the equivalent of a student saying, “teacher, it’s not *my* fault I can’t do the homework; it’s *your* fault that the homework is too hard.”

Let’s first look at a specific LR parser: LR(0).

First, we’ll use NFAs instead of DFAs, since they’re equivalent, except NFAs are a lot easier to work with. So suppose we know a CFG can be parsed by a NFA. How would we go about constructing the NFA? Let’s consider a really simple grammar, that only accepts the string “hi”:

```
S -> h i
```

A natural NFA might look like:

```
-> S -----> [h] -----> [i]
h i
```

Where `S`

is the start state and [i] is the accepting state. This works fine, but let’s adopt a different notation:

```
-> [S -> . h i] -----> [S -> h . i] -----> [S -> h i .]
h i
```

So the period (`.`

) represents “where we are in reading the production rule’s right-hand-side sequence”. Note we simply go left-to-right in the production rule (i.e. a transition corresponds to “making progress” in that rule).

This works fine, but grammars usually have multiple rules. Consider cases where we need to expand things:

```
S -> A b
A -> a a
A -> a
```

The language is `{"ab", "aab"}`

. Say we’re at a state `[S -> . A b]`

. We can’t read `A`

directly, so we want to ε-transition to `[A -> . a]`

and `[A -> . a a]`

to read `a`

(equivalent to going down a parse tree). We ε-transition because we don’t know which rule is actually relevant, so we keep both options open.

But after we read a rule i.e. we’re at `[A -> a .]`

, we want to go back to the production rule we came from (equivalent to going up the parse tree), so we ε-transition to `[S -> A . b]`

(we go past `A`

because we just read it!).

That’s LR(0) parsing in a nutshell. The transitions and ε-transitions I described are sufficient to draw a NFA that decides the CFG.