| Practical Natural Language Processing / Proseminar Künstliche Intelligenz / SS 1998 / Philipp Stolka |
Although computers speaking to humans might be quite exciting, usually it is deemed to be of more use when humans can speak to computers. Supposing that the perception work is done perfectly by the machine, we have to analyze what the string means and how it can be used - whether to execute a command, to add a fact to the knowledge base or to do something else.
Effectively, we have to do the reverse job of generation during the analysis. This delinearization of the input string is called parsing (from Latin "pars orationis" - "part of speech") - finding out which tokens represent which phrase and reconstructing the generation path that led to the perceived sentence.
Naturally, we want to use the parsing result for some purpose, and consequently we need to have a data structure that will store the combinations of words and phrases. Because of the recursive structure of the rewrite rules, a hierarchical structure would be appropriate - thus a tree called parse tree is used in which the leaves, inner nodes and links represent tokens, phrases of any kind, and applied rewrite rules, respectively.
The example below shows the parse tree for the sentence "The horse grazes."As one can easily see, the rules fulfill the requirements for rules of context-free grammars, thus the above (very small, as it only produces one single sentence) grammar is a CFG as a whole.
![]()
Six different rules were applied: 1. S - NP VP
2. NP - DET NOUN
3. VP - VERB
4. DET - "the"
5. NOUN - "horse"
6. VERB - "grazes"
Here a problem arises that is closely connected to possible ambiguities: How are multiple legal interpretations to be handled? We cannot include them in the same parse tree as this would clutter it with incompatible parse traces.
As for now, we will simply assume that the amount of ambiguous phrases is small enough to allow multiple parse trees which are stored parallelly. This restriction is important because of the influence of the ambiguities; each one is leading to exponential growth of the number of possible successive trees. Later we will see how to avoid this difficulty.
| prev: | 3.1 - The Grammar of Formal Languages |
| this: | 3.2 - Parsing And Efficiency |
| next: | 3.2.1 - A Simple Parser |