| Practical Natural Language Processing / Proseminar Künstliche Intelligenz / SS 1998 / Philipp Stolka |
We now have to devise an algorithm to recover the grammatic structure from the linear word string.
A very intuitive version would be the following: Iterating over all symbols in the string, the algorithm searches the right-hand sides of all rules for possible matching subsequences, replaces the matched symbols with the left-hand side of the applicable rule as a node and appends the substituted symbols as its children. The string in this algorithm is also called a parse forest, as it is treated as a sorted linear list of successive parse trees. For our horse-example the parse trace would look like this:
forest matched subsequence... of rule The horse grazes. "The" DET - the DET horse grazes. "horse" NOUN - horse DET NOUN grazes. DET NOUN DET NOUN - NP NP grazes. "grazes" VERB - grazes NP VERB VERB VP - VERB NP VP NP VP S - NP VP S
This returns us the above parse tree for the sentence. However, this method can easily be proven too simple for even slightly more sophisticated grammars. Consider a phrase like"The Chinese signs are manifold."The algorithm from above might produce the following trace (of course when supposing that the needed lexicon upgrade was made):
forest matched subsequence... of rule The Chinese signs are manifold. "The" DET - the DET Chinese signs are manifold. "Chinese" NOUN - Chinese DET NOUN signs are manifold. DET NOUN DET NOUN - NP NP signs are manifold. "signs" VERB - signs NP VERB are manifold. VERB VP - VERB NP VP are manifold. NP VP S - NP VP S are manifold.
Obviously, this is not a valid parse, because you cannot append "are manifold" to a complete sentence in a grammatically correct way (neither in our formal language (we do not have a "S - S VP" rule) nor in natural English). This is a situation where the parser has committed an error and has to go back (backtrack) in order to undo this and try another possible parse.
In this example, the parser has to throw away over half of his work to return to a valid state ("signs" should have been replaced by NOUN instead of VERB, thus building an NP from DET NOUN NOUN), but more dramatic examples are easy to formulate (see also [AI23]).
Consequently, we have to think about making the algorithm more efficient and/or error-proof.
| prev: | 3.2 - Parsing And Efficiency |
| this: | 3.2.1 - A Simple Parser |
| next: | 3.2.2 - The Chart Parser |