26 October 2006

Saving Read Papers, Revisited

So, why am I interesting in how you save read papers? Well, I don't want to ruin the surprise yet. First, let's take a look at the (still incoming) results. The most popular method (roughly 60% of the population surveyed) is to save them locally. People have also pointed to some tools for archiving, though my guess is that these are probably under utilized. I'm actually a bit surprised more people don't use delicious, though I do not so perhaps I shouldn't be surprised. (Incidentally, I fall into the majority class.)

The reason I'm curious is that I spend a nontrivial amount of time browsing people's web pages to see what papers they put up. Some of this has to do with the fact that I only follow about a dozen conferences with regularity, which means that something that appears in, say, CIKM, often falls off my radar. Moreover, it seems to be increasingly popular to simply put papers up on web pages before they are published formally. Whether this is good or not is a whole separate debate, but it is happening more and more. And I strongly believe that it will continue to increase in popularity. So, I have a dozen or so researcher's whose web pages I visit once a month or so to see if they have any new papers out that I care about. And, just like the dozen conferences I follow, there are lots that fall off my radar here.

But this is (almost) exactly the sort of research problem I like to solve: we have too much information and we need it fed to us. I've recently been making a fairly obvious extension to my Bayesian query-focused summarization system that enables one to also account for "prior knowledge" (i.e., I've read such and such news stories -- give me a summary that updates me). I've been thinking about whether to try such a thing out on research articles. The basic idea would be to feed it your directory containing the papers you've read, and then it would routinely go around and find new papers that you should find interesting. Such a thing could probably be hooked into something like delicious, though given the rather sparse showing here, it's unclear that would be worthwhile.

Of course, it's a nontrivial undertaking to get such a thing actually running beyond my controlled research environment (my desktop), so I wanted to get a sense of whether anyone might actually be interested. Ross's comment actually really got my attention because it would be probably easier technologically if everything could be done online (so one wouldn't have to worry about cross-platform, etc.).

Anyway, this is something I've been thinking about for a while and it seems like a lot of the tools exist out there already.

2 comments:

Kevin Duh said...

Yes, it would be great to have an automatic paper recommender and feed! I think the general area of NLP/IR tools for improved research productivity is a potentially very fruitful area. I'd like a program that recommends papers that are relevant (e.g. based on what papers I've already read), as well as what papers are important (e.g. what's listed in reading groups/lists). It would also be nice to have retrieval on all papers on the web not based on terms (like Google Alerts)--often when I'm doing literature survey and trying to find as many papers that relate remotely to what I'm doing, I find that different fields often use different terms. How do I know what terms to search? This is a classic problem in IR, but in the context of research papers there might be tailored solutions.

Anyway, if you invent something, Hal, I'd like to try it! :)

Anonymous said...

Yes, that sounds really interesting