maanantai 23. kesäkuuta 2014

To end the science festival season: EAMT 2014

With absolutely ovewhelmingly too many conferences and too little to blog about, EAMT was my first MT conference now. We were in Dubrovnik, nice touristy city, too much sea food, quite a bit of rain. Anyways, I would've thinked something like EAMT would be relatively large, given things like EACL are too, but it was single track event, with manageable amount of people. Our PR team already wrote about official side of things in CNGL blog post of EAMT 2014 (how weird is that to have someone to work on that kind of stuff in academia, eh?). There's the thing that I said there that I think is indeed one of the main goals for my research, the end user applications we know are quite doable and have seemed like science fiction since 50's but we still aren't there and the reason isn't quite clear. We have our machine translation chains, with or without analysers, OCR, speech recognition, TTS, there's no reason we shouldn't have mobiles that can read your foreign food menus and translate them, listen to server and you interpret between you the whole interaction–we have all this and in reasonable quality as well, it's not sci-fi anymore. I know I travel a lot and while I make an effort to learn basics of languages* as I go, I would much enjoy such app. So we'll try to have one. Now the problem with this is, all machine translation systems are built to understand and translate only documents of European parliament, so they fail when they see non-prose texts like food menus. Even though translating menus and ingredients would be easier. And interaction with waiters too. No sentence  structures, just words and some phrases you know. Should think about building for that, hmm.

keskiviikko 11. kesäkuuta 2014

Some LREC 2014 ideas

In this continuation of endless stream of conferences I seem to be having this year, LREC is second to last before the autumn. And LREC is one of the biggest in our field, so even though most of main conference stuff is pretty basic, it's the only place to distribute language resources and stuff that's basic engineering and data harvesting, yeah, lot of collaborations and social networking goes on. I don't have much interesting insights of the conference content, we presented some basic infra work and that simple lexc optimisation hack I thought of years back. We saw few resources, Hungarian data is still not available for most part, Lakota has lexicon of and twol rules and all of 50 words or so. Trmorph has been developed further.