sunnuntai 11. toukokuuta 2014

Bad and worse code in scientific programming

I was just reading a couple of articles about software engineering: The Low Quality of Scientific Code and Why Bad Scientific Code Beats Code Following Best Practices. Some of you may remember me from such presentations as FSCONS 2013, where I talked about the very same thing. I very much agree with the first text of course, most code I have to look at is rather dreadful, and it is very much a surprise it ever works. And one of the things I've learnt after moving to new projects from HFST (and apertium) is that, while there's a lot of bad code in there, it's actually still among the better ones with all the floss software engineering conventions that actually got implemented and at least someone every once in a while cares to follow. Take for example my trying to learn statistical machine translation, the most prominent project in the field is called moses. No tests, cannot be installed, consists mostly of kludgy scripts that only work occasionally, often by a side effect of some other script ran before in the same directory. When you want to use moses from other project you make a note of where you unpacked its source and hope that everyone who uses it will have same random scripts in same places, in one of the billion script directories there is. The thing is, it's not much harder to actually do things properly, you don't need to hire a software engineer to understand that there should be a test that runs your program and give it some input and gets the expected output and doesn't crash and all simple things like that. It's not intellectually difficult thing to grasp and the implementation doesn't take more than few minutes which is saved in each of the update and debugging cycles. So yeah, I don't have much to rant now, the blogs already said the things.

lauantai 3. toukokuuta 2014

Some EACL ideas

So I went to EACL in Gothenburg without having anything much to present there. One of the things that I usually do when I have to fly around the world with long layovers is update my compling projects. Maybe it’s inspired by Norvig’s spell-checker, who knows. This time I of course spent time on making apertium-fin-swe. The array of apertium-based Finnish translators is starting to shape up to be nice it is. This invokes the traditional problem that there’s no reuse of code in NLP, and there are no standards for whatever analyses are.