perjantai 31. lokakuuta 2014

The folly of reproducing bugs, recreating errors and coming up with something from nothing

The more I dive into the task of high quality machine translation the more I get annoyed about the standards by which the machine translation are measured. For those who don't know, the quality of machine translation is solely measured on how well the system can recreate a translation made by human translators. Fair enough, if the machine comes up with the same translation as professional translator we can say it has done a marvelous job indeed. Except if maybe translator made a mistake. But that's not all, a job of translator, a professional one who makes high quality translations at the very least, is to translate content to target audience so they can read it. This can often mean adding new data to fill in the audience of facts that perhaps the audience of source language text know better, a machine coming up with that will be a smart machine indeed, but I don't see that happening. Human translators can also drop a lot of words when it's too wordy, e.g. certain ways of saying stuff in English will almost sound like you're explaining things to a child in Finnish if you take too exact translations, but again, a machine smart enough to realise that is not what I foresee in the near future. And then there's a lot of rewording, humans will kind of know when you want to translate verbs into nouns and reword whole sentence cause the way of expressing things is kind of weird for this language, well the machine that realises this may be plausible, in fact, if we just throw enough data to a statistical system it will realise that the sentence is odd, and may have seen the rewording.

The reason why I'm writing this is, I finally for the first time took a serious look at the data that is used to measure the quality of machine translation, that is the europarl stuff. The vast majority of it is horrifying, to the extent that I as human cannot even begin to explain how, if you are given this English sentence can you come up with anything distantly resembling that Finnish sentence, or even matching up with say half of the words in it. If I cannot explain the translation it is at least obvious that we cannot build a rule-based system to map between these two, but even with statistics to be able to learn that, say, if you are talking of tv channels shown in the hotels of members of European parliament and specifically a tv channel named FOO, and English text doesn't give any details of the channel, you are expected to translate FOO as FOO, this is a dutch tv channel broadcasting mainly news, what kind of statistics would really give you good evidence of doing that is rather weird, but not getting that right will probably reduce your score for that translation to 0! While that specific example is not in europarl hopefully, there is a session that is about tv channels and there are a lot of cases like that all over, and indeed machine translation is tasked on finding a way to create algorithms that would faithfully reproduce that kind of rewordings and additions, it's whole lot of nonsense how the systems are evaluated really.

So, machine translation IMO is never particularly suited for much of paraphrasing, rewording, adding information or that sorts of task, moreover we should really concentrate on making systems that a) faithfully carry the information across to the reader as it is and only then b) make it sound like it is grammatically correct and colloquial in target language. Trying to optimise systems to make all these high quality rephrasings and stuff is really a foolish goal to have, rather than to just make sure that the systems are good at not losing any information nor inverting any meanings, which is the biggest problem as I see with current system mainly caused by the fact that they are trying to solve this all at once. Like, who cares if English is more likely to say "don't forget to frobble" where Finnish speakers would go "remember to frobble", but with the current scoring system we do get penalised hugely for not getting that right, and perhaps, getting the common statistical machine mistranslation "don't remember to frobble", oh, it gives us more points of course! So that's what we're optimising our system for. Sweet.

By the way did I mention that if we'd score professional translators by the same measures we use for machine translation they would usually get scores that we deem so low that they're not worth of publishing. Yeah, ain't that a good measure.

Yeah yeah, so this is not specific to machine translation but all computational linguistics in its glory, this madness follows us anywhere we go it does. It is a good thing that we want to systematically measure how well we're doing, instead of just throwing random things at random ad hoc implementations of stuff and writing 5 page essays for conferences describing why things got better with stuff, but it is indeed an exercise in futility when things go towards the trying to recreate bugs or mistakes just because. This is the case of most things like so-called morphological analysis, there are no good standards or metrics on it, so whoever writes the first "gold standard", which in morphology will either mean just a dump of some systems output with systematic errors and all, or another choice is to actually have human annotators do the standard, which may be slightly better. Unfortunately the big mistake here is that people doing linguistics don't really understand how things work, e.g., how to really really prove that analysis is correct and there's some measurable evidence that proves it, but the way human annotators work is just by intuition, And so we end often enough with the enjoyable task of reproducing either bugs of another system or linguistic intuition.

In conclusion, all the millions of money that is spent on computational linguistics, most of it goes in engineering aimed at reproducing bugs and mistakes faithfully, your money at work, isn't it.

maanantai 13. lokakuuta 2014

Tense and time-travelling

Today's episode of the Big bang theory reminded me of something I wrote years ago about time-travelling and tenses in human languages (I wonder if it's still available somewhere, I had beginnings of the listing of a 1000 and one useful verb forms for time-travellers) , that in turn also relates to how misguided even most of the school books about human languages are. One popular representation of this is people saying whether there's future in your language or not, like readers of language log are aware, even some respectable publications fell into that fake story about how English has future and languages that don't are economically worse. The scene on tbbt is about telling the time travelling paradox of back to the future with reference to alternate timelines, and it is clearly obvious that English language has necessary tense structures for telling combinations past future perfects (or what's its name, I'll look it up once the script or subs of the episode are online) as understandably as present, future or past or pluperfect. So what's the point of school grammar teaching that English has future and Finnish hasn't? There isn't, Finnish has plenty of auxiliary verbs as capable as English will for being explicit about future, not calling it a future tense is just good for silly false anecdotes about languages. And, by the way, the discussion about tenses useful for time travelling is ripped from The hitchhiker's guide to galaxy.