AI smokes 5 poker champions at a single time in no-limit Hold’em with'relentless consistent'
Machines have shown their superiority at one-on-1 games like poker and chess. But in complicated multiplayer versions of the cardgame, humans have maintained their edge... up until now. A new evolution of the AI agent that flummoxed poker pros is now winning in championship-style six person games.
In a paper published today in Science, it was shown that the CMU/Facebook team they call Pluribus beat five professional poker players in one game. Or, one player against five copies of itself. It is a huge leap forward in machine capability and is, astonishingly, far more efficient than the previous agents.
One-on-1 poker is a bizarre game. It's not easy to play. However, its zero-sum nature (whatever you lose, the opponent gets) allows it to be open to strategies that a computer can use to advantage. Adding four additional players can make things more complicated.
There are six possible outcomes, hands and bets. It's impossible to cover them all, especially when you only have a few minutes. It would be like trying not to capture every grain on a beach between waves.
Pluribus was able to win over 10,000 hands playing with champions. This allowed it to keep a steady income and not expose any weaknesses or habits its opponents could exploit. What is the secret? Consistent randomness.
Even computers make mistakes
Pluribus was not trained by studying the human game but by playing against it. It is almost like watching kids or me play poker at the beginning. The AI and children make mistakes but learn from them.
Monte Carlo counterfactual regret minimization was the method used to train participants. This sounds like drinking whiskey after losing your shirt at a casino. Machine learning-style.
Regret minimization is simply the process of minimizing regret. When the system finishes a hand, it will then play that hand again, exploring what might happen if it was raised, folded instead or called. It is counterfactual, because it did not really happen.
Monte Carlo trees can be used to organize and evaluate many options. This is similar to climbing branch-by-branche a tree and noting each leaf's quality. Then, you pick the best one after you've climbed all the branches.
If you plan ahead (as in chess), you can find the best move. Combining it with the regret function allows you to look at all the possible outcomes of the game and decide which one is the best.
Monte Carlo counterfactual guilt minimization is simply a way to systematically investigate what might have occurred if the machine had acted differently, then adjust its model of how it should play. How Not to Lose Your Shirt in Small No-Limit Texas Hold 'Em
Komentar
Posting Komentar