| Practical Natural Language Processing / Proseminar Künstliche Intelligenz / SS 1998 / Philipp Stolka |
Originally, translation was the main aim of the NLP technology development.
US-Justiz und 20 Staaten klagen gegen Microsoft (neu)
Das US-Justizministerium, Vertreter von 20 US-Bundesstaaten sowie der District of Columbia haben beim Bezirksgericht in Washington separate Klagen gegen Microsoft eingereicht. Die Kläger empfehlen dem Gericht, den Softwarehersteller per einstweiliger Verfügung aufzufordern, Windows 98 entweder ohne Internet Explorer oder in Verbindung mit dem Navigator des Konkurrenten Netscape auszuliefern. Das Gericht soll gleichzeitig Microsoft untersagen, künftig Druck auf Online-Dienste und Computer-Hersteller auszuüben, die ihren Kunden Alternativen zu Microsofts Internet Explorer anbieten. Justizministerin Janet Reno hat die 52seitige Anklageschrift mittlerweile ins Web stellen lassen. (...)US law and 20 states complain against Microsoft (new)
The US Ministry of Justice, representatives of 20 US Federal States as well as the District OF Columbia submitted separate complaints with the district court in Washington against Microsoft. The plaintiffs recommend to the court to request the software producer by provisional order to deliver Windows 98 either without Internet Explorer or in connection with the navigator of the competitor Netscape. The court is to forbid Microsoft at the same time to exert in the future pressure on ons-line service and computer manufacturers which offer their customer alternatives to Microsofts Internet Explorer. Law minister Janet Reno let place the 52seitige indictment meanwhile in the Web. (...)US-Gesetz und 20 Zustände beschweren sich gegen Microsoft (neu)
Das US-Ministerium von Gerechtigkeit, Repräsentanten von 20 US-Bundeszuständen sowie dem Bezirk der Kolumbien eingelegten unterschiedlichen Beanstandungen mit dem Amtsgericht in Washington gegen Microsoft. Die Zivilkläger empfehlen sich dem Gericht, den Software-Produzenten durch provisorische Ordnung anzufordern, Windows 98 entweder ohne Internet Explorer oder in Zusammenhang mit dem Nautiker des Konkurrenten Netscape zu liefern. Das Gericht soll Microsoft gleichzeitig verbieten, Druck auf Onszeile Service und Computerherstellern zukünftig anzuwenden, die ihre Kunde Alternativen Internet Explorer Microsofts anbieten. Gesetzminister Janet Reno ließ Platz die Anklage 52seitige unterdessen im Web. (...)table 1: Translation from German to English and back with the same program, "Babelfish" at www.altavista.digital.com
From the late 1940s to the 1960s, there was widespread belief that translation was not basically different from code breaking, which was already dealt with in World War II, when the Enigma code was attacked. Nonetheless, in 1966 the US government ALPAC report shut down all the work in this field: "There has been no machine translation of general scientific text, and none is in immediate prospect."
Having in mind the idea that all of the text's meaning is in the words directly on the surface or only slightly below, scientists only reluctantly realized that there is more to it than mere 1-to-1 mapping of words or phrases. This view was based upon the idea of a "common language" that should be the basis shared by all other natural languages. Instead, you have to consider many layers that can possibly hold information (we will come back to this point later, with an interesting twist that would have surprised these researchers).
You can have interdependencies between parts of the texts, allusions to external objects that cannot be derived explicitly from the text, ambiguities and metaphors that would completely spoil any attempt to replace one word with the corresponding term of the other language, and so on. You can never know what the sender meant by putting his sentence together without having deep introspection into the other's mental processes - and this ability machines will, for some time, be lacking. So they need to extract virtually everything about the context from the text itself, what in turn brings us to the first constraint on automatic translation or every other subfield of NLP: The text needs to be restricted to a limited range of subjects. There is a concept of controlled or restricted languages, also called "Caterpillar English", which allows only a limited range of syntactic constructs and operates on a somewhat smaller vocabulary than everyday language. Hence, these (sub-)languages are mainly used in technical applications which call only for relatively little linguistic domains. Without having a sound world knowledge base, you cannot understand certain ambiguous or (in lexical terms) ill-defined sentences. In many cases, you have to access this data base for expanding your information about the objects that are being spoken about.
For example, consider the phrase "in connection with the navigator of the competitor Netscape" in the preceding text sample. It was (obviously incorrectly) translated to "in Zusammenhang mit dem Nautiker des Konkurrenten Netscape". One has to know about the specific terms of the computer market (here a product name was used; the translation system falsely took this as a proper noun) in order to be able to perform translation effectively and reliably.
Building such a knowledge base is difficult due to the required size of the data base and its format. In current numbers, you need about 20.000 to 100.000 distinct words to achieve any useful performance. Second, it is still widely discussed how to represent concepts (such as words) for easy computer access. This stands in close connection to the problem of interlinking this knowledge: How shall the computer recognize the intended usage of a certain word or metaphor (even one-word metaphor) and, furthermore, translate it to-the-point into the target language?
Here, another difficulty arises: Though one can easily translate single words, often you do not catch all the intended meanings. While a certain term covers a particular domain in one language, it often does not do so in the other, and thus you are urged to have the mentioned detailed insight into the situation to which the text refers.
Nevertheless, pre-translated texts as the one above may be a great help to human translators, which can significantly speed up their work with such help at hand.
| prev: | 2 - Technology Applied |
| this: | 2.1 - Translation |
| next: | 2.2 - Database Access |