Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

no, CFR is mainly just a way of computing Nash equilibria and (although in some sense it is an online, iterative algorithm) would typically be used to precompute Nash strategies, not update them in real time. real poker playing systems augment the CFR strategies with some real-time solving, but just to get even closer to Nash at the end of a hand.

on top of this, you could think about augmenting these systems to exploit weaknesses in opponent strategies. there is some work on this, but I don't think it's done much. The famous systems that played against professionals don't use it, they just try to get as close to GTO as possible and wait for opponents to screw up.



Hmm, I see, thanks for the reply. My mistake - I watched an interview (that I can't find now, ugh) with a poker player who played against one of the top CFRM bots and claimed that it felt like it was adapting to his playstyle.

But it sounds like that must have been either misunderstanding or some other part of the bot's algorithm I guess.


...in case you would be willing to share some knowledge - what exactly is a GTO play in poker? Does it mean a Nash equilibrium strategy? Something else entirely?

Whenever I search this stuff I get practical poker strategy guides, but none of them seem to define the term haha


Two player poker is a zero sum game, where GTO play is very well-defined as just playing a Nash equilibrium strategy. The solvers try to get as close as they can to that.

Life is a lot more complicated in multiplayer poker. There are Nash equilibria, but potentially many with different payoffs, and you can't force your opponents to choose the one you're aiming for. So in that case, it's not so obvious what "optimal" means.

As for CFR adapting to opponent play: CFR could bias its compute resources towards really finely optimizing strategies for the most likely scenarios facing certain players, and it seems like this has been done during poker tournaments.

But within those situations, it would still be trying to more perfectly approximate the Nash strategy, vs. more experimental approaches which actually choose a different strategy to exploit opponent weaknesses.


Gto poker = whatever solvers say.

Common expression is "deviate from GTO" where you know what the solver would do but decide to play differently.


I see, thanks, so I guess it depends on what the solver is actually doing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: