People expect translations to happen at the speed and accuracy of a Star Trek Universal Translator. Unfortunately, that time is not here yet, but we are approaching it quickly.
The acquisition of translation memory and machine translation retraining is the key to making this Sci-Fi notion a reality, but it would take an army of machine translators an estimated 1,000,000,000,000,000 finely crafted translations to statistically gain the accuracy rate of a real human translator.
Seem impossible? Take heart. It is doable, but in order to achieve this goal, we must change our mindset by producing these translations more efficiently in a cloud-based collaborative and accessible way.
Today there is no consistency with the way translation memory is generated and stored. Common approaches allow analysts to align previously translated documents in order to generate translation memories for the potential benefit to a Computer Aided Translation (CAT) tool in the future. Unfortunately these tactics waste time and provide very little gains down the road, unless all content is domain specific.
Machine translation usage today is also a sticking point when generating content for end-users. Machine translation engines do not get the message across to the customers nearly as well as a degreed linguist trained in the culture and nuances of the target language. This is partially why machine translations alone should only be used for certain types of content — and not all content types — to convey corporate messages.
How to Fill in the Missing 35 Percent
Think of computer translation as a child learning to write. Despite the fact it doubles in capacity every 18 months and has access to the largest collection of information known to man, the Internet, it is still in its adolescence. It will present its best possible guess at how to translate something, based on a large statistical set that would make the average person’s head spin, but it needs the help of professional and novice translators to help it learn.
Content entering the cloud-based system should be broken up into small, manageable parts called segments. These segments make it easier for the computer to make its best guess and to receive help along the way. The segment is first populated with the computer’s estimated translation, typically resulting in something close, but it still needs work to achieve the proper message. Remember, the computer is akin to a child, and children rarely get it right the first time either.
But they do remember what they have learned and so does the cloud-based translation management tool. Every segment is analyzed to see if it has been translated before. Sources can include a client’s aligned documents or 1,000s of translations previously accomplished. The system looks for exact matches and swaps them with the computer’s best guess. In most cases, the document is now 65 percent accurate and conveying the information intended for a client’s audience.
Translators are then needed to fill in the missing 35 percent. There is no substitute for bilingual speakers who have both a good handle on the target language and the company’s messaging in mind when they translate.
By using cloud-based translation this is all possible, making the idea of a Universal Translator closer to reality than most people think.
Title image courtesy of Neung Stocker Photography (Shutterstock)
Editor's Note: To read more of Rob's thoughts on translation:
About the Author
Rob Vandenberg is the President CEO of Lingotek, a crowdsourced based language translation company. Prior to being named CEO, Vandenberg served as the company's VP of sales and marketing. Prior to Lingotek, Rob was one of the first 20 U.S. employees at INTERSHOP Communications where he helped build its worldwide business & helped make the INTERSHOP IPO one of the most successful enterprise software company IPOs in US history - ($10B market cap). Later, Rob co-founded and served as the CEO of LocalVoice, which was acquired by HarrisConnect in 2005. Rob received a bachelor's degree in political economics from UC Berkeley.