Yet another fascinating extract from my current masterpiece
II. Measuring Generated Text Quality
Quality assessment is a recurring problem in the translation industry, predating the arrival of computers by many centuries. The Koine Greek translation of the Torah, prepared in the second and third centuries B.C.E., is reputed to have used to have used either divine inspiration or peer review to guarantee quality, depending on which apocryphal account you prefer. Despite these efforts, it is regarded as a quite poor translation overall and Hebrew scholars brought at least six centuries of correction to it. (Catholic Encyclopaedia 1911) Peer review, being somewhat easier to arrange than divine inspiration, is now widely accepted as the best general approach to insuring translation quality. However, it is no less labour intensive than translation itself and consequently tends to be reserved for literary and scholarly works. It is generally minimal or non-existent in common commericial practice except in areas of unusually high liability like medical documentation and legal materials.
Evaluting translation quality has not, on the whole, changed in quite a long time. The SAE J2450 translation quality standard, which was only finalised in 2001, is little more than a way for a human reviewer to assign a number to their evalation of the translated text. Efforts to evaluate MT quality have largely followed traditional practice by using a human evaluator to assess the quality of the output. These methods, however, have very serious limitations. They are quite labour intensive, so they are difficult to provide on a large enough basis to establish overall quality. The subjective nature of these evaluations makes uniformity a serious problem. Lastly, in an enviroment where the machine translation product is not expected to stand alone but is used as an aid to the human translator, it is very difficult to ensure that these quality evaluations genuinely reflect translator labour.
However, using generated texts to write final translations offers us an obvious way to evaluate their quality: We can compare the generated output to the final, human produced translation. It is by comparison of the two that we can tell how much and how little we are actually assisting the translator. Evaluating machine translation quality cheaply and comprehensively means deploying an algorithm to evalute the difference between the generated text segment and the final translation.
Fortunately, this problem has been addressed by a group of algorithms usually refered to as edit distance metrics by mathematicians and fuzzy string matching by computer scientists. These kinds of algorithms are already deployed in most translation memory systems under the label fuzzy matches. However, fuzzy matching has such a poor reputation among translators that we would prefer to use the more concise technical term edit distance to describe it.
Hey, I managed to get a reference to the Septuagint into a computer science grant proposal. I think I deserve some credit for that. :^)
I've been off-line for a week. Things have been a bit messy lately in real life. Along with a number of other compilications to my life, my wife has gone back to the States for a month, and I expect to join her for a week at the end of the month. I also managed to get my old cellphone number back, so that's at least one good thing.
And, I have a new post up on A Fistful of Euros for anyone suffering from withdrawl due to my lack of blogging. :^)
Anyway, I have to make up some new, more technical materials for my company's grant proposal, and since some of my readers are also translators, I though I might put up the first part - the section which offers no real clue how to clone our work. I'm open to reactions.
The research programme that we are advancing is motivated by a number of practical considerations as well as a particular theoretical model of the translation process.
Of the novel tools that the 90's introduced to the translation industry, it is apparent that only one has enjoyed genuine success and acceptance by translators: translation memory. We believe that the failure of machine translation to gain acceptance, despite being an older and far more ambitious technology that has absorbed far more time and funding, is substantially the failure of the cognitive models that have driven it.
The promoters of machine translation have traditionally viewed MT not as a labour saving device for translators but as a partial replacement of them. This sort of thinking continues to permeate discussions of MT within the translation industry, where the term "post-editing" is still used to describe the task of human translation in conjunction with MT systems. In this model, translation is a process driven by the MT system, and the translator is understood as a post-editor who adds value to a machine translated text. Translation memory, in contrast, is a translator driven system. It is nothing but a database of existing translations and its contents are entirely determined by translators. It is a genuine labour-saving device, since it minimises the translator's workload by making sure that for any particular segment of text the translator need only translate it once. The translator is neither replaced nor reduced to a lesser role, because every sentence in the translated text is still the work of a translator.
We believe that translation should remain a process driven by translators, who remain the focus of all translation activity. The mechanical aids placed at translators' disposal should not be imagined as doing translation, but as devices designed to enhance the productive power of individuals. We contend that the task of these systems is to offer the translator a packet of information which is easily absorbed and which minimises the cognitive load of composing new translations. In this way, the computational apparatus which surrounds the translator acts as an extension of his or her own cognitive apparatus. Successful automation in the translation industry will be built on gains in machine aided translation, not automated translation, for the foreseeable future.
This distinction between machine-driven and translator-driven work lends itself to a family of models of activity and cognition generally known as distributed cognition. We are using a particular framework called Sociocultural Activity Theory to give our efforts a theoretical basis. (Vygotsky 1932/1986, Cole & Engeström 1993) This theory is increasingly important in the software design industry, which has long confronted difficulties in building software that enhances productivity. (Nardi, et al 1995, and Walenstein 2002) It advances a number of theoretical constructs that are useful in analysing the translation process, but we will only look at two of them here. The first of these is the idea that artefacts of some sort always stand between people and the objects of their activity. Second, artefacts in conjunction with human knowledge and abilities can form a single system, termed a "functional organ", in which the tool is adapted to the person and the person to the tool, enabling the whole to function better than the parts.
This sort of analysis suits the translation process quite well. Translating is a very information intensive process which, even in the pre-computer era, made heavy use of tools external to the translator. In the classical context, these were usually printed reference materials, such as dictionaries and glossaries as well as translations of related materials, and mechanical text production devices like typewriters. The typewriter in conjunction with the manual skills of the translator is a functional organ for the production of written texts. In the same sense, reference materials in conjunction with the linguistic capabilities of the translator are a functional organ for transforming information from one language into another. It is primarily this latter functional organ which is the object of our research, and we are largely concerned with the functioning and enhancement of those cognitive supports which are external to the translator.
Translators are human. They have limited memories, limited attention spans and suffer from fatigue and other performance-limiting phenomena. We cannot realistically change this property of human bodies, and some linguists believe that even if we could, our ability to learn and manipulate language might well be damaged rather than enhanced. (See Newport 1990, for example.) Yet, the qualities that we would most like to see in a translation are the very ones that the human translator is least naturally suited to give us: completeness, accuracy and consistency. Thus, the translator is compelled to use cognitive supports like dictionaries and term lists during translation.
This human frailty was a major motivation behind early MT. (Although admittedly the labour-intensive nature of translation was a more important motivator.) Machines are well suited to ensuring completeness, accuracy and consistency. However, despite over fifty years of effort, the core process of uncontrolled natural language translation still cannot be genuinely automated. The form and complexity of the information involved requires an authentically human knowledge of the world. (Bar Hillel 1960 is the classical source of this claim.) Even if we could construct computers potentially capable of storing and manipulating this encyclopaedic information about the world, it is not clear that there is any way to acquire this data except by embedding the computer in a slowly maturing human body.
Computers are, therefore, not well suited to the very human problem of constructing good translations. Consequently, enhancing the productivity of translators through automation means using computers to create better functional organs for translation. We must pay a great deal more attention to the interface between human translators and the machines that support their activity, and, although the translator must adapt to the machine, it is far more important for us to adapt the machine to the translator.
Machine translation, while it may not offer us much hope of substantially replacing the translator, does offer us the prospect of a very convenient interface between automated systems and the translator. We want, ideally, to generate a text that encapsulates the information that the translator would ordinarily be forced to search out in reference books and previous translations. This sort of comprehensive search and consistent result is the domain in which the computer excels, but where the human translator often fails. By putting the result in the form of a readable text, we minimise the additional cognitive load of interpreting this information. Where the text diverges only slightly from being a correct translation, the work of fixing it is quite simple. Where it diverges sharply, if it remains a readily comprehensible text which has, to the degree possible, used the terminology and usages which we would expect to find in a good translation, we believe that we have still made translators' work much easier by reducing the need to laboriously look up terms and check with previous translations.
Update: They move quickly over at Taccuino di Traduzione, where not only is this post linked to, but there is also a link to an article on machine translation in Italian. Alas, my Italian is not too good, so I used Babelfish as an aid in reading it. However, the stripped-down Systran code that powers Babelfish translated the title as "The bacon of the translator automatic rifle", which, I think, neatly demonstrates the point that the article is trying to make.