March 13, 2009

Trying to Exploit a Small Mistake Can be a Mistake

I made a comment in a recent post in comparing exploitative play and equilibrium play that sometimes being slightly wrong about an opponent's strategy that you're trying to exploit will lead to loss in profitability, and open you up to counter-exploitation, particularly when your best response function is discontinous. I thought I'd expand on that a little bit with an example.


Consider again the Shortstack Game (yes, Pinky, this is more shameless intrablog pimpage). Recall that in the equilibrium we found, player 2 reshoves over player 1's open 1/3 of the time. Say you're player 1 and you think that player 2 is reshoving only 25% of the time (just for fun, we'll make it the top 25% as defined by PokerStove), whereas in actuality he is playing equilibrium and shoving 1/3 of the time. You exploit this by best responding to the strategy you think player 2 is employing, i.e., reshoving 25% of the time. In Part 3 of the Shortstack Game we found that if player 2 is folding more than 2/3 of the time, player 1 should raise every time. This is the discontinuity of player 1's best response function. If player 2 folds exactly 2/3 of the time, player 1 opens 56% of the time. Anything more and he opens 100% of the time.

This miscalculation leads to a large loss in expected payoff for player 1. In equilibrium (i.e., when he is best responding to player 2's equilibrium strategy), his expected payoff is .42. When he opens 100% of the time and calls shoves only when he has the required 40.9% equity against top 25% hands. This represents only 26% of all hands. Player 1's expected payoff from this strategy is then (2/3)(1.5) + (1/3)(.74*(-3) + .26*(.509*21.5-.491*20)) = .36. Less than equilibrium strategy, but not too bad.

What about the opposite thought experiment - what if an equilibrium player ran into a non-equilibrium player and didn't know it? What if player 1 thought player 2 were shoving 1/3 of the time as he should in equilibrium, but in actuality he's only shoving 25% of the time? Well we know that player 1 makes .42  in equilibrium, and we know that this is a zero-sum game. That is, any loss for player 2 is a gain for player 1. Finally, player 2 shoving 25% of the time instead of 1/3 of the time as he should results in a lower expected payoff (otherwise shoving 1/3 of the time couldn't be an equilibrium strategy). So player 1 makes even MORE than .42 against this player (it turns out to be around .5). If player 1 were playing exploitively vs. player 2, he would make .65 on average. Definitely a big improvement over the equilibrium strategy.

But what if player 2 notices that you are raising every time now, instead of 56% as equilibrium would dictate, and starts reshoving 50% of the time, instead of 25%? It would likely take you a while to realize this (longer than it would take your opponent to realize you are raising 100% instead of 56%). And in the meantime, your expected payoff would plummet all the way to .08. Dagger.

Obviously, this is a somewhat contrived example. There are situations where an opponent is clearly making a mistake that would be foolish not to exploit. But there are many situations in poker where best responses are discontinous (e.g., a small change in bluffing frequency for an opponent changes the best response from mixing between calling and folding to always calling or always folding). But in situations where you don't have a large enough sample to have a good idea of an opponent's strategy, going with an equilibrium strategy is usually best.

-BRUECHIPS

No comments: