12 April 2017

Humans can still extort more money from me than machines can

Like lots of folks, I wonder sometimes about AI and jobs. I'm neither a believer that there's a catastrophe coming up, nor am I a believer that everything will magically work out and we're not entering a world with new forms of inequities.

I did have an experience recently that made me think somewhat differently about what sorts of jobs are at risk.

I was visiting family in LA, and renting a car (because LA). I can't remember what company it was, but they didn't have "live people" at the booth. Instead they had a row of kiosks, each with a monitor and camera and old school phone that---I kid you not---looked like:

Complete with curly cord.

So how does this work? You go up to the device, and it auto-dials a real person. This person lives who knows were, though in my case he had a nice painted backdrop of some nature scene.

They ask about your reservation, look it up, etc., and while looking it up, he asks what I'm doing in LA. Oh I'm visiting my mom. Blah blah blah.

I then have to hold my drivers license up to the camera, we chat some more about how LA traffic is horrible. Blah blah blah.

Throughout this whole conversation, I'm thinking: this would be so much easier if this thing were automated. I don't mean automated like "chatbot", I mean automated like the kiosks at airports, where I just push a bunch of buttons and then get a ticket (or in this case, car).

And then, as you're all expecting, he asks me if I want to buy insurance.

No machine would ever under any circumstances be able to sell me insurance. It would show a screen, ask me to click if I wanted it or not, and I would immediately press the "No" button.

Now, I'm kind of a pushover, and so, especially since this guy was nice, had asked about my mom, shared complaints about traffic together, etc., when he asked me if I wanted insurance, it was much harder to say no. I did say no, though I'm pretty sure that me-five-years-ago might not have.

Whatever company this is, I'm sure they did a study. They looked at how much they'd save by having automated systems rather that some folks in presumably a part of the country/world with much lower cost of living than LA. And the main trade-off was almost certainly that a machine isn't going to be nearly as successful at upselling insurance or car model or whatever as a person. And clearly at the end of the day, they decided that automation wasn't worth it here.

(Insert all sorts of analogies to other jobs/areas of life.)

EDIT 11am Eastern 12 Apr 2018: It occurred to me after I hit "post" that it's worth highlighting a major implicit caveat: the fact that the alleged study showed that it was worth putting money into human interaction is probably largely due to whatever the car rental company knew about, for instance, the economic status of their average customer. Where this needle falls---for instance if no few customers will choose to or can afford to be upsold---will have a significant impact on what the results of such a study will be, and will therefore vary across industries.


03 April 2017

Structured prediction is *not* RL

It's really strange to look back now over the past ten to fifteen years and see a very small pendulum that no one really cares about swing around. I've spent the last ten years trying to convince you that structured prediction is RL; now I'm going to tell you that was a lie :).

Short Personal History

Back in 2005, John Langford, Daniel Marcu and I had a workshop paper at NIPS on relating structured prediction to reinforcement learning. Basically the message of that paper is: you can cast structured prediction as RL, and then use off-the-shelf RL techniques like conservative policy iteration to solve it, and this works pretty well. This view arose, for me, main from Collins and Roark's incremental structured perceptron model, but I'm sure it dates back much further than that (almost certainly there's work from neural nets land in the 80s; I'd certainly appreciate pointers!). This led eventually to Searn, and then in the early 2010s, I went around a bunch of places (ACL, AAAI, ICML, etc.), espousing connections between structured prediction and (inverse) reinforcement learning (slides).

This kinda sorta caught on in a very small subcommunity.

One thing I should have realized looking back ten years is that you should not try to sell a hammer to hammer manufacturers, but rather to people with weird nails. I spent too much time and energy trying to convince the "traditional structured prediction" crowd (the CRF and M3N and SVMstruct folks) that the "RL" view of structured prediction was awesome. That was a losing strategy, unfortunately. (Although I learned a few nights ago at dinner that this might be changing now!)

But now everything has changed. To some degree, Ross, Gordon and Bagnell's DAgger is a successful, better successor to Searn, and for years I had gone around telling everyone DAgger is my favorite algorithm ever (it consistently outperforms Searn's in their---and my---experiments, and is really really easy to implement and has stronger-ish theory). And then DAgger (or more precisely DAD) gets renamed/rebranded as "scheduled sampling" (though you should read Marc'Aurelio's comment, which is very on point), and now these ideas are everywhere, particularly in sequence-to-sequence neural transduction models.

Nowadays

In the past year or two, there's been a flurry of work applying not just imitation learning algorithms like DAgger to neural models for structured prediction problems, but also just applying straight-up RL algorithms (like reinforce, policy gradient, or actor/critic) to them. The important point is that while people have tried to do things like neural CRFs, etc., the basic sequence-to-sequence style model naturally fits a search-based structured prediction (aka RL-ish) view.

But these tasks are not the same and, in fact, structured prediction is much simpler, and I think we need to develop algorithms that take that into account.

The biggest difference is that in (all or at least almost all) structured prediction problems, conditioned on the input x, the world is known, deterministic and therefore reversible and/or fully-explorable (modulo limits of computation). This is generally not true in RL, and one of biggest challenges in RL is that once you take an action, you cannot un-take that action, and you cannot try out other alternatives.

That is to say: computation aside, in structured prediction, conditioned on x, you can build out the entire search tree and do whatever the heck you want with it. (Of course "computation aside" makes no sense in a SP setting because the whole difficulty of SP is computation.) In fact, one of the big advantages to things like CRFs is effectively that they do build out the entire search tree, at least implicitly, which is possibly precisely because of the limited expressivity of features.

This observation is probably perfectly obvious to most NLP folks. In a sense, the semantic parsing crowd has been doing something reinforcement-learning like for quite some time. You produce a semantic parse, run it against a database, check if you get the correct answer or not. If so, positive reward; if not, negative. But no one (as far as I know) just produces one parse: you produce a beam of a bunch of parses and try them all. This is definitely not something you can do in standard RL, but it is something you can do in structured prediction.

In my mind, this was (and continues to be) one of the major weakness of the Searn/DAgger approach to structured prediction that continues to be a problem in applying standard RL algorithms to structured prediction. In a real sense, I think this is something that incremental perceptron, broken LaSO and not-broken LaSO-BST, and seq2seqLaSO got right that the more RL-ish approaches got wrong. (This continues to be true in bandit structured prediction, which will get a separate post in the maybe-near future.) One approach that blends the two to some degree is Vieira and Eisner's recent paper that uses dynamic programming and change propogation within LOLS (which is effectively a variant of AggreVaTe, a follow-on to DAgger) to learn to prune (it's not obvious to me how to generalize this to other tasks yet).

Why the Gap?
I don't think this gap is an accident, and I think there are essentially two reasons it exists.

First, as suggested above, is the question of computation. If you're willing to say "I don't care about computation" then you might as well put yourself in CRF world where life is (relatively) easy, at least if you want to do analysis. If you do care about computation, then doing the "purely greedy" thing is very natural and then you can say "well I know I'm computationally efficient because I'm greedy, and now I can focus entirely on the statistical problem." Once you're willing to spend a bit more computation, you have to figure out how to "charge" yourself properly for that computation. That is, you enter a world where there's a trade-off between statistics and computation (though not in the usual sense) and it's not at all clear how to balance that. It's also hard to convince myself that it's better to spend ten times longer on a single structured example than to do ten different examples. This is a question I've been interested in since my dissertation (p44) but have made basically zero progress on:

(Note that I don't currently agree with the entirety of this passage---in particular, the complexity argument is somewhat broken---but I think the basic idea is right.)

Second, I think there's a bit of a looking-under-the-lamppost effect that's not easy to ignore. Here, the lamppost is mainly a computational efficiency lamppost, and secondarily a convenience lamppost. Greedy search is really fast. Even compared to beam search with a beam size of K, greedy is often much more than K times faster because you don't have bookkeeping overhead. And it's way easier to implement greedy solutions than non-greedy, especially in neural land if you want things to be efficient on a GPU. And often toolkits have greedy already implemented for you. This obviously isn't un-recoverable, but runs into the problem of: if I can do 50 sentences greedily on my GPU in the time it would take to do beam-10 search for one of the sentences, is the computation really going to come off in my favor?

As always, I'd appreciate pointers to work that I don't know about that addresses any of these challenges!