Philipp J. Stolka
Practical Natural Language Processing / Proseminar Künstliche Intelligenz / SS 1998 / Philipp Stolka




3.2.1 - A simple Parser

We now have to devise an algorithm to recover the grammatic structure from the linear word string.
A very intuitive version would be the following: Iterating over all symbols in the string, the algorithm searches the right-hand sides of all rules for possible matching subsequences, replaces the matched symbols with the left-hand side of the applicable rule as a node and appends the substituted symbols as its children. The string in this algorithm is also called a parse forest, as it is treated as a sorted linear list of successive parse trees. For our horse-example the parse trace would look like this:

forest matched subsequence... of rule
The horse grazes. "The" DET - the
DET horse grazes. "horse" NOUN - horse
DET NOUN grazes. DET NOUN DET NOUN - NP
NP grazes. "grazes" VERB - grazes
NP VERB VERB VP - VERB
NP VP NP VP S - NP VP
S    


This returns us the above parse tree for the sentence. However, this method can easily be proven too simple for even slightly more sophisticated grammars. Consider a phrase like
"The Chinese signs are manifold."
The algorithm from above might produce the following trace (of course when supposing that the needed lexicon upgrade was made):

forest matched subsequence... of rule
The Chinese signs are manifold. "The" DET - the
DET Chinese signs are manifold. "Chinese" NOUN - Chinese
DET NOUN signs are manifold. DET NOUN DET NOUN - NP
NP signs are manifold. "signs" VERB - signs
NP VERB are manifold. VERB VP - VERB
NP VP are manifold. NP VP S - NP VP
S are manifold.    


Obviously, this is not a valid parse, because you cannot append "are manifold" to a complete sentence in a grammatically correct way (neither in our formal language (we do not have a "S - S VP" rule) nor in natural English). This is a situation where the parser has committed an error and has to go back (backtrack) in order to undo this and try another possible parse.
In this example, the parser has to throw away over half of his work to return to a valid state ("signs" should have been replaced by NOUN instead of VERB, thus building an NP from DET NOUN NOUN), but more dramatic examples are easy to formulate (see also [AI23]).
Consequently, we have to think about making the algorithm more efficient and/or error-proof.

prev: 3.2 - Parsing And Efficiency
this: 3.2.1 - A Simple Parser
next: 3.2.2 - The Chart Parser

back to main page...

Last modified: Tue Sep 22 21:21:55 MEST 1998