Wednesday 12 February 2014

Computer Assisted Translation, Machine Translation and Translation Memories for beginners

I have written this as a brief introduction for clients who are considering having content translated, and may have heard of these systems before but are not sure what they are and how they can be applied to their particular situation.

Basic terminology
First let’s consider Machine Translation (MT). The simplest example of this is Google’s Translate function. This is a totally software (i.e. “Machine”) driven translation process; no humans are involved, so of course the output is somewhat variable. At the same time most MT systems are constantly accumulating master and translated content for inclusion in their massive databases; the issue is that the MT system has no way of verifying the translation (the old programmers rule "Garbage in = Garbage out" comes to mind). Indeed Google recently admitted that its translation engine is accumulating it's own MT translations as translation data; when someone uses Google to translate their website and publishes the translation online Google may then access the translation and download it to its database, assuming it is an accurate translation! 

Consequently we do not recommend using any currently publicly available MT for any translation that your business will then either rely on for decision making purposes, and definitely not for anything you would put in front of a customer (such as a website or an email).

Next lets look at Computer Assisted Translation (CAT) systems. This is not Machine Translation. At it’s simplest, CAT provides a system that allows a segment of text (a phrase, a sentence etc.) to be translated by a translator and stored in a database, matched to the original text. It is important to note that the translation is still done by a human; CAT just manages the process of translation.

Translation Memory (TM) is the output of a CAT system. This is stored as a database of original segments alongside their matching translation. Think of a giant spreadsheet, with the original content in one column, broken into discreet segments, and the translation entered alongside in the next column. Included with the TM may be a Term Base (TB). Term Bases are used to store information on specific terms, such as industry specific jargon (e.g. “Translation Memory”), an explanation of the term, and the translation of the term into the target language(s).

So to summarise the jargon;
MT: while constantly improving, is still nowhere near good enough quality to use except as a rough guide
CAT: software that assists the human translator manage the translation process
TM: the memory output of CAT, allowing the storage of segments and their translations
TB: a memory of specific terms/jargon, an explanation and their translation(s) 

What benefits do CAT and TM bring?
So why would you use CAT to manage your translation and keep a TM of the output? Well, some of the main benefits are as follows:
  • It can speed the translation process up; if a segment appears a second, third or more times (what we call a repetition), the system will recommend the original translation, meaning the translator does not have to re-enter an existing translation.
  • It acts as a double-check that all content is translated. The system will alert the translator to any untranslated segments.
  • Most CAT programs can import a wide range of file formats, meaning the translator may not need the original software your files were created in.
  • Especially when multiple translators are working on a single project, by implementing a Term Base and a cloud-based TM, it enforces consistency of terminology and translation across all translators.
  • When a master document is updated, this can be loaded and compared to the TM of the previous translation. The CAT system will then alert the translator to any segments that differ from the original, meaning the translator only has to update the new content, rather than translate everything from the beginning again.
  • If you ever need to change translator, the TM is portable. This means the new translator will have access to all previous translations, enabling consistency with earlier translations irrespective of who does the translation.
Are there any issues with CAT?
Yes, there are a few potential disadvantages. CAT is primarily of maximum benefit for projects that have a high level of repetition (repeated content throughout a single document, or across multiple documents), a lot of jargon/specialised language, and/or for content that is regularly updated (and so translations need to be updated as well).
  • CAT may not bring any benefit on smaller projects. The cost and time involved in setting the project up for translation using CAT can outweigh any benefit in translation productivity.
  • Front end marketing copy may not be translated well using a CAT system. Because of its very nature (the system recommends the re-use of existing translations), and the fact that the translator is forced to use a segment by segment approach (rather than considering the entire text and its flow), CAT can lead to translations that are somewhat repetitive.
  • If content is unlikely to be updated on a frequent basis one of the main benefits of CAT is lost.
  • If there is commercially sensitive information involved, then the client needs to consider security issues of having this content stored long term offsite.
  • Editing and proofreading become even more important when using CAT; if there is an error in the initial segment translation, this will then be propagated throughout the entire document. 
  • Some languages, such as Japanese, tend to form very long, multi-phrase sentences. These can become difficult to translate using CAT, as it will depend on how the content is segmented.
In short, MT should only ever be used to generate personal, non-critical translations and never for front end business materials; while constantly improving, it still cannot produce natural, error free translations. CAT and TM can be of great benefit in terms of cost and productivity, especially where there is high volume, high repetition, and frequent updates, but it is not the solution for all projects. If you need to know more, give us a call and we would be happy to discuss if CAT can be applied to your translation needs.





No comments:

Post a Comment