maanantai 23. kesäkuuta 2014

To end the science festival season: EAMT 2014

With absolutely ovewhelmingly too many conferences and too little to blog about, EAMT was my first MT conference now. We were in Dubrovnik, nice touristy city, too much sea food, quite a bit of rain. Anyways, I would've thinked something like EAMT would be relatively large, given things like EACL are too, but it was single track event, with manageable amount of people. Our PR team already wrote about official side of things in CNGL blog post of EAMT 2014 (how weird is that to have someone to work on that kind of stuff in academia, eh?). There's the thing that I said there that I think is indeed one of the main goals for my research, the end user applications we know are quite doable and have seemed like science fiction since 50's but we still aren't there and the reason isn't quite clear. We have our machine translation chains, with or without analysers, OCR, speech recognition, TTS, there's no reason we shouldn't have mobiles that can read your foreign food menus and translate them, listen to server and you interpret between you the whole interaction–we have all this and in reasonable quality as well, it's not sci-fi anymore. I know I travel a lot and while I make an effort to learn basics of languages* as I go, I would much enjoy such app. So we'll try to have one. Now the problem with this is, all machine translation systems are built to understand and translate only documents of European parliament, so they fail when they see non-prose texts like food menus. Even though translating menus and ingredients would be easier. And interaction with waiters too. No sentence  structures, just words and some phrases you know. Should think about building for that, hmm.

*The way I learn languages basics is to implement a computational model of translating them. This is a very good method for myself. I'll learn structures of languages, not vocabulary, that's what us linguists do. I know now that croatian has locatives, in both nouns and prepositions, to mark the locative constructions Finnish uses one of its six location cases, that is puutarha+ssa goes to u vrt+u, so puutarha matches vrt and inessive +ssa matches to both preposition u in loc form and locative suffix +u. You can see the end results of all I've learnt in Apertium incubator repository.

Machine translation is all about drawing these connections. In statistical machine translation, everything is connected to everything, then we invent clever methods to make some connections weaker and some stronger, until we can find something reasonably strong but usually not really so we just wing it. Rule based machine translation has no connections at all, and linguists take great care to draw only the plausible, beautiful lines here, but the end result will have lots of things missing. Now the interaction between this two, the so-called hybrid approach, the rules will tell statistics not to bother with these stuff, we know they are never valid, the statistics will tie-break the rules that cannot decide, and add good guesses where nothing seems plausible. I should really draw some pictures of this, they'd be ever so neat.

I've also made some further advances to my last summer's pizzur theory of linguistic comprehension that I need to write another blog post of.

Ei kommentteja:

Lähetä kommentti