I heard about this on a Russian blog. Then on another Russian blog. Then on one more. I thought it was a joke but then I tried it myself and discovered that it was true.
It turns out that when you enter the sentence “Putin will not become the President of Zimbabwe / China / France / Spain, etc.” in Russian, the Google Translator translates it correctly (although it confuses the future tense and the past). When, however, you enter the sentence “Putin will not become the President of Russia” it is translated as “Putin has become the President of Russia.”
Here is how the translation looks:
I translated sentences “Putin will not become the President of America” and “Putin will not become the President of Russia.” You don’t need to understand Cyrillic characters to see that the only difference in the original sentences is the last word (the name of the country.) The translation, though, misses the word “not” when Russia is mentioned. I tried it several times and the Google Translator keeps translating the sentence well until you start adding the word “Russia.” Is that cool or what?
Of course, since I used to be a specialist in machine translation, I know why this happens and it doesn’t scare me. 🙂 Some people have been freaking out, though. Does anybody care to venture a guess as to why this happens?
P.S. I never give links to Russian websites where I find my information about Russia. I don’t do it because I don’t want to get dumped on yet again for providing links to non-English-language sources and it bores me to give endless disclaimers. Are there people who want these links to blogs in RUSSIAN? In any case, here is one of the places where this phenomenon was described (in Russian).
According to WP, Google Translate is a “statistical machine translator”. Statistical machine translators translate by using huge bilingual corpora, and then using the most common correspondences between language A and language B are used to translate a phrase in Language A to Language B (ibid). Presumably, it uses the most common correspondence to select the default, and provides other common correspondences as alternatives.
My explanation is borne out by the fact that if you click on “Putin has” it offers “Putin did not” as an alternate translation. (For some reason, it does not do that mutatis mutandis for the first sentence.)
Since I don’t speak any Russian, I looked to Wiktionary, which indicated that besides mostly being a negator/negative particle, “не” is also a vague emphatic/exclamatory particle. Hence, I conclude GT has a bad Russian-English Corpus. It is a bad corpus because it has more instances of “не” being used as an emphatic/exclamatory particle (perhaps to emphasize Putin’s first time) and not a negator. Since computers can’t think, GT falls for this statistical quirk.
I tested my theory by using several Russian given names in place of “путин” and none of them had the same error. Since a corpus (novels, perhaps) presumably could well have a number of “X did not become president…” sentences, and fewer emphatic/exclamatory sentences, there would be no such statistical quirk for GT to fall for, which is borne out by my test.
LikeLike
Sorry about the messed up italics :(.
LikeLike
And you are absolutely right!
God, I have smart readers. I never even get a chance to show off my knowledge because they already know everything. 🙂 🙂
LikeLike
This is a very complicated explanation. I still do not understand why the “statistical quirk” happens with the country name “Russia” but not with other country names.
LikeLike
I’ve been dying to share this knowledge, so thanks for asking, Kinjal.
Original machine translation systems were based on an algorithm that allowed a computer to decipher a sentence and translate it into the target language. It was very hard to make such an algorithm that would codify every phenomenon in a language to at least a marginally acceptable degree. I’ve worked in this area for years and, let me tell you, it was harsh. (But lots of fun.)
Google Translator, however, is a radically different system that has single-handedly destroyed machine translation. Google Translator doesn’t really translate anything. It has no algorithm. Instead, it relies on the fact that Google has access to an incredible amount of written texts. When Google Translator is given a text to translate, it doesn’t really translate. It looks for equivalencies in texts that already exist.
Now, if we were to consider the entirety of the Internet, which text do you think comes up most frequently containing words “Putin, become, president”? Obviously, it will be “Putin will become president”, right? So Google Translator just takes it and supplies it as a translation, which, in strict terms, it isn’t. What Google Translator provides is the best possible guess that, more often than not, is correct because people are predictable. 🙂
And if somebody starts telling me that I simplified this too much, yes, I did, but this is what I’ve been asked to do by a curious reader.
LikeLike
Babelfish seems to get the tense and negation right, but путин becomes “fishings he”. WIth proper capitalization it seems to get it about right.
LikeLike
I had to think about it but “fishings he” totally makes sense. 🙂 This is too funny!!! Thank you for telling me. 🙂
I collect these stories about the funny gaffes of machine translation systems because I used to work on figuring out why they happen and correcting them. This was my very first profession.
For Spanish speakers: a machine translation system once translated an airline’s ad “Fly in leather!” as “Vuela en cueros!” (meaning, “fly naked!”) 🙂 🙂
LikeLike
WIth proper capitalization. Bangs head on desk.
LikeLike
thank you, Clarissa. Simple explanations are always nice and I was sure that a teacher like you would appreciate that fact. 🙂
LikeLike
There is nothing I like more than explaining things. Professional deformation. 🙂
LikeLike
So what’s the explanation behind “fishers he”?
LikeLike
This is actually very funny: “putina” in Russian means “fishing season.” So the system sees the word without the feminine ending “a” and attaches “he” to make it masculine. The system of MT I have helped create doesn’t make such basic mistakes. Of course, the systems that are available for free online are the most basic and make very funny mistakes.
LikeLike
thanks for the insight – I would have assumed that GT was actually ‘translating’. While it may be generally correct, I would prefer a real translator.
LikeLike