Generally speaking, I don’t like to point out the linguistic errors of others (or if I do, I at least try to be polite about it). But if it’s a mistake of a computer program I don’t feel compelled to hold back.
I’ve been checking up now and then on Japanese to English translation technologies, especially Google Translate for which I wrote an article analyzing the translation of a short passage in 2015 and then again in late 2016. The quick summary is that the translation quality was pretty bad, and while it may have improved a little bit in 2016, it still has a long way to come.
The other day, when I was on an airplane I discovered that I was required to pay the (rather expensive) fee to use internet whenever I tried to visit any website. But there was one exception: Google. I quickly figured out that I could use this loophole to do a few tasks which are key parts of my translation and editing process: look for word synonyms, definitions, and commonality of phrases.
To search for the English definition for a word, I was simply entering in the Japanese word followed by “英語” (‘eigo’=English) as the search keywords. This worked pretty well, since I could often read the definitions from a bunch of hits. It just so happens this triggers Google Translate to show a little window at the top of the search results which display a simple translation.
For example, using the keywords “ご飯 英語” I get “Rice”. While this is clearly not a complete definition (since “ご飯” can also mean “meal” in a more general sense), at least it’s in the right ballpark.
For the most part I just skimmed over the Google Translate result and looked at the page results below it. However when translating a chapter of one of my latest translation projects, I came across the word “へたり込む” (hetarikomu), and boy did this give a horribly wrong (and funny) translation by Google Translate:
At the bottom of the above screenshot, you can see the correct definition (“sink down to the floor [ground]”) which can be easily found by searching a Japanese/English dictionary.
So how in the world did Google Translate come up with “Writing or fart”??
Well, to begin with, as you probably know, there are usually no spaces in Japanese writing. It’s quite a bit of work for the translation algorithm to pick apart the words (I know, I’ve written an algorithm for it before from scratch).
Let’s break 「へたり込む」 into three parts and analyze what Google Translate apparently thought:
- へ (he): While is is technically part of the single verb “hetarikomu”, the meaning of “fart” is actually correct if it was a word on it’s own.
- たり (tari): Though a grammatically incorrect interpretation here, “tari” can be used with verbs to mean something like “to do … and/or other stuff”. For example, “歩いたり…” (aruitari …) would be “to walk and/or do other stuff”. So I can vaguely see where the Google Translate’s “or” interpretation came for.
- 込む (komu): As far as I know, the word “komu” means nothing like “writing”, but entering this word by itself translates as “writing”, so this is some mistake in the Google Translate dictionary I guess.
Although I said above that breaking a stream of Japanese characters into words can be difficult to do, in this case as long as Google’s dictionary had the word “へたり込む” in it, the process should go pretty smoothly. So this, plus the problem with “komu”, implies they need a better lookup dictionary. In any case, the fact that manual dictionary lookup gives a better translation than a translation engine (that surely took millions of dollars to develop) is pretty sad.
In Google’s defense, the word “へたり込む” isn’t that common, at least judging from my experience and the fact their search results that say there’s only 146 hits for it (that’s ignoring the 130,000 result count that initially appears that is a horribly inaccurate estimate).
It will be interesting to try this in a few months to see if this ever gets fixed. While I know they have a mechanism for accepting feedback from users, even if a small fraction of the massive number of words require user input to correct mistakes (and require developers to fix the engine or dictionary accordingly), it will be a very long time before Google Translate can have any level of reliability, at least for this language pair.
I guess you can say that you can still use this tool for a rough translation which is edited/corrected by a professional translator. But in that case, it might be faster to just to the translation yourself without the tool. Nevertheless, if there are cases where just the general gist is needed and the result in the target language doesn’t have to be natural, this tool can still be useful.