17 January 2006

The MT Blues: Would a program get its facts right?

I just came across an article on Slate entitled The Translator's Blues: Will I get replaced by a computer program?, written by Jesse Browner, a human translator. I've no idea how wide-read Slate is, and I think Browner is correct that his job is not (yet) in jeopardy, there are several common misconceptions in the article that really shouldn't be there.

First, the results of his example sentence are atrocious. I see no reason why even the rule-based systems should get this wrong (though I suspect his poor results had to do largely with the difficulty of inputting accented characters). Language Weaver stood out, which is good, because it's a good system. In fact, in all the news texts he tried, it is reported to fare well.

And he's right that the technology has limits: it could not translate the first sentence of Don Quixote well. This is the domain adaptation problem: Language Weaver's system was trained on news data and (surprise surprise) it performs well on news and not on fiction. I think the important point is that the problem is domain. IMO, it has nothing to do with context, which is what Browner attributes the problem to. The con/pen example is something that I believe a decent statistical MT system should and would get right; if not now, in a few years once language models are better. It's unfortunate that Browner doesn't say that Language Weaver is (largely) a statistical systems, because he seems to say that they suck, yet LW was by far the best system he tried.

My final comment is in reference to the attribution to Mike Collins, who clearly knows much more about this field than many. Most likely Mike was quoted out of context, but Language Weaver's system -- and, in fact, most good research systems out there -- are based on machine learning. That's what "statistical" means.

(Incidentally, for those less literary minded, the first sentence of Don Quixote should be translated something like "In a village of La Mancha, the name of which I have no desire to call to mind, there lived not long since one of those gentlemen that keep a lance in the lance-rack, an old buckler, a lean hack, and a greyhound for coursing." I'm not convinced that a professional translator who worked for the UN would find this easy to translate either.)

1 comment:

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花