24 July 2010

ACL 2010 Retrospective

ACL 2010 finished up in Sweden a week ago or so. Overall, I enjoyed my time there (the local organization was great, though I think we got hit with unexpected heat, so those of us who didn't feel like booking a room at the Best Western -- hah! why would I have done that?! -- had no A/C and my room was about 28-30 every night).

But you don't come here to hear about sweltering nights, you come to hear about papers. My list is actually pretty short this time. I'm not quite sure why that happened. Perhaps NAACL sucked up a lot of the really good stuff, or I went to the wrong sessions, or something. (Though my experience was echoed by a number of people (n=5) I spoke to after the conference.) Anyway, here are the things I found interesting.

  • Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates, by Matthew Gerber and Joyce Chai (this was the Best Long Paper award recipient). This was by far my favorite paper of the conference. For all you students out there (mine included!), pay attention to this one. It was great because they looked at a fairly novel problem, in a fairly novel way, put clear effort into doing something (they annotated a bunch of data by hand), developed features that were significantly more interesting than the usual off-the-shelf set, and got impressive results on what is clearly a very hard problem. Congratulations to Matthew and Joyce -- this was a great paper, and the award is highly deserved.

  • Challenge Paper: The Human Language Project: Building a Universal Corpus of the World’s Languages, by Steven Abney and Steven Bird. Basically this would be awesome if they can pull it off -- a giant structured database with stuff from tons of languages. Even just having tokenization in tons of languages would be useful for me.

  • Extracting Social Networks from Literary Fiction, by David Elson, Nicholas Dames and Kathleen McKeown. (This was the IBM best student paper.) Basically they construct networks of characters from British fiction and try to analyze some literary theories in terms of those networks, and find that there might be holes in the existing theories. My biggest question, as someone who's not a literary theorist, is why did those theories exist in the first place? The analysis was over 80 or so books, surely literary theorists have read and pondered all of them.

  • Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish, by Reyyan Yeniterzi and Kemal Oflazer. You probably know that I think translating morphology and translating out of English are both interesting topics, so it's perhaps no big surprise that I liked this paper. The other thing I liked about this paper is that they presented things that worked, as well as things that might well have worked but didn't.

  • Learning Common Grammar from Multilingual Corpus, by Tomoharu Iwata, Daichi Mochihashi and Hiroshi Sawad. I wouldn't go so far as to say that I thought this was a great paper, but I would say there is the beginning of something interesting here. They basically learn a coupled PCFG in Jenny Finkel hierarchical-Bayes style, over multiple languages. The obvious weakness is that languages don't all have the same structure. If only there were an area of linguistics that studies how they differ.... (Along similar lines, see
    Phylogenetic Grammar Induction, by Taylor Berg-Kirkpatrick and Dan Klein, which has a similar approach/goal.)

  • Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation, by Michael Bloodgood and Chris Callison-Burch. The "trend" referenced in the title is that active learning always asymptotes depressingly early. They have turkers translate bits of sentences in context (i.e., in a whole sentence, translate the highlighted phrase) and get a large bang-for-the-buck. Right now they're looking primarily at out-of-vocabulary stuff, but there's a lot more to do here.
A few papers that I didn't see, but other people told me good things about:
At any rate, I guess that's a reasonably long list. There were definitely good things, but with a fairly heavy tail. If you have anything you'd like to add, feel free to comment. (As an experiment, I've turned comment moderation on as a way to try to stop the spam... I'm not sure I'll do it indefinitely; I hadn't turned it on before because I always thought/hoped that Google would just start doing spam detection and/or putting hard captcha's up or something to try to stop spam, but sadly they don't seem interested.)