Machine Translation in the Language Industry

Blog

A computer screen through glasses

As the translation industry turns increasingly digital, customers are now more well-informed and connected than ever before. They are well aware of machine translation providers such as Google Translate. But is this technology really useful in a professional environment? Can both client and translator benefit from it? And how does it actually work?

How it works

On a basic level, MT performs simple substitution of words in one language for words in another, but that alone usually cannot produce a good translation of a text because recognition of whole phrases and their closest counterparts in the target language is needed. Here’s an example of a particularly bad machine translation from Chinese into English:

Hope our baby can satisfy your princess dream, meanwhile, hope your tired mind in your own private space can be full released. Simple contract is not only a kind of life attitude, is also a brief form of spiritual enjoyment, hint of boreal Europe amorous feelings, it is to cater to the popular new interpretation of the life, make the northern wind is popular all over the world, it is the perfect interpretation of another kind of detachment natural nature.

You might have already guessed it: this is a product description for a bed linen set. Corpus statistical, as well as neural techniques, help solve this problem. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports or aircraft manuals).

There are four different types of machine translation:

  1. Statistical
  2. Example-based
  3. Hybrid MT
  4. Neural MT
Statistical machine translation

This type tries to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament. Google Translate and similar statistical translation programs work by detecting patterns in hundreds of millions of documents that have previously been translated by humans and making intelligent guesses based on the findings. Generally, the more human-translated documents available in a given language, the more likely it is that the translation will be of good quality.

Example-based

Example-based machine translation is based on the idea of analogy. In this approach, the corpus that is used is one that contains texts that have already been translated. Given a sentence that is to be translated, sentences from this corpus are selected that contain similar sub-sentential components. The similar sentences are then used to translate the sub-sentential components of the original sentence into the target language, and these phrases are put together to form a complete translation.

Hybrid MT

Hybrid machine translation (HMT) leverages the strengths of statistical and rule-based translation methodologies. Several MT organizations claim a hybrid approach that uses both rules and statistics. The approaches differ in a number of ways:

  • Rules post-processed by statistics: Translations are performed using a rules-based engine. Statistics are then used in an attempt to adjust/correct the output from the rules engine.
  • Statistics guided by rules: Rules are used to pre-process data in an attempt to better guide the statistical engine. Rules are also used to post-process the statistical output to perform functions such as normalization. This approach has a lot more power, flexibility and control when translating. It also provides extensive control over the way in which the content is processed during both pre-translation (e.g. markup of content and non-translatable terms) and post-translation (e.g. post translation corrections and adjustments).

More recently, with the advent of Neural MT, a new version of hybrid machine translation is emerging that combines the benefits of rules, statistical and neural machine translation. The approach allows benefitting from pre- and post-processing in a rule guided workflow as well as benefitting from NMT and SMT. The downside is the inherent complexity which makes the approach suitable only for specific use cases. One of the proponents of this approach for complex use cases is Omniscien Technologies.