|AlphaZero - Stockfish (2017)|
On December 4th, 2017, Google Headquarters in London applied their DeepMind AI project to the game of chess. The event was more of an experiment than a chess exhibition, and the results are groundbreaking in both the fields of computing and chess.
Rather than relying on the classic "alpha-beta algorithm" common to conventional chess software, AlphaZero uses a deep neural network and is trained solely by reinforcement learning from games of self-play. It scans only 40,000 positions per second compared to Stockfish's 70 million. AlphaZero played Stockfish 100 games, winning 28 and drawing the rest.(1) A subset of the match, 10 games that AlphaZero won, was released to the public.
(1) Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm - https://arxiv.org/pdf/1712.01815.pdf
| page 1 of 1; 10 games
TIP: You can make the above ads go away by registering a free account!
< Earlier Kibitzing · PAGE 5 OF 5 ·
|Dec-11-17|| ||The Kings Domain: Fascinating. Love to see a human whip these machines. :-)|
|Dec-11-17|| ||WorstPlayerEver: <alexmagnus>
Yes, sorry but I got a little annoyed.
Because it went -as explained by A0 team- through billions of moves until it had their opening book established.
Therefore it would be useful to see how it approaches other lines than the QI with White.
Because it's easy to pick an opening which was developed recently and let a powerful computer analyse for a few hours.
|Dec-11-17|| ||dunkenchess: That's better.|
|Dec-11-17|| ||AylerKupp: <<not not> All that has been said already, notably by SF's author>|
Please post a link to the source of what SF's author said. I would like to see what else he might have said that I missed.
|Dec-11-17|| ||morfishine: I don't use opening books which is why my score vs 2300+ players (at either classical or Chess960) remains stuck at Alpha Zero|
|Dec-11-17|| ||keypusher: <Ayler Kupp>
<The match results by themselves are not particularly meaningful because of the rather strange choice of time controls and Stockfish parameter settings: The games were played at a fixed time of 1 minute/move, which means that Stockfish has no use of its time management heuristics (lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move; at a fixed time per move, the strength will suffer significantly). The version of Stockfish used is one year old, was playing with far more search threads than has ever received any significant amount of testing, and had way too small hash tables for the number of threads. I believe the percentage of draws would have been much higher in a match with more normal conditions.
...[~AlphaZero could also be improved]...
But in any case, Stockfish vs AlphaZero is very much a comparison of apples to orangutans. One is a conventional chess program running on ordinary computers, the other uses fundamentally different techniques and is running on custom designed hardware that is not available for purchase (and would be way out of the budget of ordinary users if it were).
For chess players using computer chess programs as a tool, this breakthrough is unlikely to have a great impact, at least in the short term, because of the lack of suitable hardware for affordable prices.
For chess engine programmers -- and for programmers in many other interesting domains -- the emergence of machine learning techniques that require massive hardware resources in order to be effective is a little disheartening. ...>
Courtesy of zanzibar.
AlphaZero - Stockfish (2017)
|Dec-11-17|| ||markz: <s4life: A aws p3.16xlarge instance which rents for ~ 10 bucks an hour has 8 GPUs each of which is almost as strong as a last generation TPU2 and 4 times stronger than a first generation TPU1.>|
The standard price of a p3.16xlarge instance is $25/hour, including 8 V100s. Each V100 costs almost $10000, speed is 30 TFLOPS (FP16). The TPU2 has 4 TPU-chip, each chip is 45 TFLOPS, total speed is 180 TFLOPS. One p3.16xlarge is 240 TFLOPS, only slightly faster than one TPU2. Therefore, each TPU2 should be around $15/hour.
< the 5000 TPU1s were used to generate self-played games. The actual training takes 8 aws p3.16xlarge instances running for 4 hours = 8 * 10 * 4 = $320.. fine it's $300-$400 bucks to train the model.>
The 5000 TPU1s generate the training data (games), which is a part of the training.
<What's the point of arguing how much hardware a model uses for training anyways???? GPUs will become commodity in no time if they are not already. TPUs are just a market gimmick, there's nothing fundamentally different in a TPU vs a GPU>
A $5000 GPU is totally different from a normal $50 ~ $200 GPU. It is cheating that a supercomputer A0 with 4 TPUs plays against a PC with 1GB hash (stockfish).
|Dec-12-17|| ||frogbert: <Tord Romstad, one of the creators of Stockfish, knows exactly what he is talking about. It is clear that the match was not fair for Stockfish.>|
Tord (who used to play for my chess club back in the days) makes more or less the same points as I did here on chessgames.com right after the publication of the AlphaZero paper. The authors' claims of being at Stockfish level after 4 hours are simply marketing nonsense.
Still, as a programmer with master level courses in AI, I also agree with <Monocle>: technologically AlphaZero is extremely interesting. Neither I or most other knowledgable ppl I know with combined insights in chess and programming thought that it would be possible to use more or less the same approach as was used to create AlphaGo to produce a really strong chess playing program.
The creators of AlphaZero don't care about chess per se. Their field of interest is AI. Still, I'm slightly put off by the dishonest marketing. Maybe I should blame that on Google.
|Dec-12-17|| ||alexmagnus: <frogbert> Even if the four hour claim is exaggerated - unlike Stockfish, AlphaZero can improve without doing anything about its own cod, just by keeping training its neural network. It reached that "crippled" SF level after 4 hours, and I see no reason not to assume it would beat the best possible configuration of SF after, say, four days. And we don't even know if AlphaZero's neural network itself is optimal (most probably it isn't, aster all humans made it). In ten years the fairness of this March will be just as irrelevant as the fairness of the Deep Blue vs Kasparov match was in 2007 - neural networks will be far beyond conventional engines without a doubt.|
|Dec-12-17|| ||alexmagnus: Of this match, not of this March)|
|Dec-12-17|| ||frogbert: I think you meant 1997, too. ;)|
|Dec-12-17|| ||frogbert: Btw, if I remember correctly, the version that played the 100 games 1 min per move match, had trained for 3 days, not 4 hours.|
The 4 hour equal result was achieved using 1 sec per move - and knowing what you do about depth first search with alpha-beta pruning in the chess domain, you should acknowledge that Stockfish isn't 3000+ strength at all at that «time control», as stated in the paper. And certainly not without opening book and endgame tables. Even I occasionally beat Stockfish at 1 second per move (of course taking significantly longer for *my* moves).
|Dec-12-17|| ||alexmagnus: I meant 2007, "ten years from..."|
|Dec-12-17|| ||frogbert: Aha - read too fast there. Sorry! :)|
|Dec-12-17|| ||frogbert: <As for opening book, as I already said, hey don't use opening books in TCEC either.>|
But they do use a set of pre-selected, forced openings, don't they?
|Dec-12-17|| ||alexmagnus: Yes, but they are not allowed to use own books for those preselected openings.|
|Dec-12-17|| ||AylerKupp: <keypusher> Thanks for the link to the article. I'm glad to see that Todd Romstad made many of the same comments I did, so apparently I'm not the only one to raise questions. I had also wondered (but neglected to mention) about the use of 64 <threads> mentioned in the original article since one typically indicates the number of <cores> that the chess engine is executing on, not the number of threads. 64 threads could involve 32 cores with hyperthreading (which most chess engine developers indicate is counterproductive) or – at the extreme – a single core machine running 64 threads (which would run slower than a single thread engine running on a single core). Too many questions.|
One possible way to compare the effect of having or not having use of an opening book and tablebase support is to run a 100-game match at 1 minute/move between two configurations of the same version of Stockfish 8, one with opening book and tablebases enabled and one without either enabled. If (as I suspect) the scores of the two instances are similar, then that would demonstrate AlphaZero's superiority a lot better. But, if the configuration with opening book and tablebase support enabled scores significantly higher, that would indicate that the configuration used in the AlphaZero match was significantly weaker than the "normal" Stockfish and raise some doubts.
And, of course, the issue of hardware must be addressed. Until a match can be arranged using the same hardware (I'm not sure if AlphaZero could currently run on a typical PC without the custom hardware), the issue of one engine's superiority over another cannot be determined.
One thing I don't agree with is what several have claimed, that the availability of special high performance hardware to conduct the training of AlphaZero's neural network gives it an unfair advantage. To me that would be the same as counting the hours spent by the top engine's developers to tweak its evaluation function or the time used to generate the endgame tablebases. I think that it's only the hardware configuration that involved in playing the match that's important.
|Dec-12-17|| ||njchess: The achievement is remarkable, though it is more about the progress of machine learning then it is about chess.|
From my understanding, the AlphaZero team started their testing with the game Go because it is a complex, symmetrical environment. By symmetrical, I mean the same rules of play apply to all pieces. They gave their program time to learn the game through self play, and then tested it against an opponent to determine their level of success.
Their results were so positive that they decided to test their program in a complex asymmetrical environment, namely chess. They repeated the process with similar positive results.
You could question the test conditions, the processing power of the various platforms, but regardless, the results are impressive in terms of machine learning. The fact that a program that was entirely self taught in FOUR hours could compete, let alone dominate, under any circumstances against anyone is amazing.
By comparison, in 1997 IBM spent nearly a year programming Deep Blue to play Kasparov. And, during the course of the match, they were continually updating Deep Blue in order to beat him.
From a programming/machine learning perspective, this represents a significant achievement. I think the implications for organizations looking to invest in this technology are potentially enormous.
|Dec-12-17|| ||AylerKupp: <<alexmagnus> unlike Stockfish, AlphaZero can improve without doing anything about its own cod[e], just by keeping training its neural network.>|
I'm not so sure about that. Did you read the original paper? Its Figure 1 shows the Elo rating of AlphaZero for Chess, Shogi, and Go as a function of the thousands of steps. AlphaZero's Elo rating after about 150K steps was approximately the same as after 700K steps, and the graphs for both Shogi and Go showed similar shapes, although not as pronounced. So maybe the playing strength achieved by AlphaZero using its approach has limits that much larger training times cannot overcome. Then again, this perceived playing strength limit may simply be a function of the current implementation; increasing AlphaZero's chess playing strength may simply be a case of additional nodes and layers plus increased training time. I just don't know.
|Dec-12-17|| ||john barleycorn: <AylerKupp: <<alexmagnus> unlike Stockfish, AlphaZero can improve without doing anything about its own cod[e], just by keeping training its neural network.>|
I'm not so sure about that. Did you read the original paper? ...>
I agree <AylerKupp>. In the case of Alphawhatever "Learning" is definitely restricted to a particular instance. I would like to see how the "learnings" from a game like Go were applied in chess much as humans in a sense would do. Right now there is no "learning" but a superior handling of a gigantic database, imo.
|Dec-12-17|| ||nok: <AlphaZero can improve without doing anything about its own cod, just by keeping training its neural network.> That's indeed misleading. After relative equilibrium has been reached the network would have to be reworked.|
|Dec-12-17|| ||alexmagnus: <. AlphaZero's Elo rating after about 150K steps was approximately the same as after 700K steps, and the graphs for both Shogi and Go showed similar shapes, although not as pronounced.>|
Improving is of course slowing down, just as for a human it is easier to go from 1000 to 1500 than from 2000 to 2500. I actually expected a log(x) like curve - which isn't quite fitting there but still is "somewhat" in the order. But note that log(x), while growing painfully slow, still grows infinitely.
|Dec-12-17|| ||s4life: <nok: <AlphaZero can improve without doing anything about its own cod, just by keeping training its neural network.> That's indeed misleading. After relative equilibrium has been reached the network would have to be reworked.>|
Well, yes and no. Typically playing with hyper-parameters is what consumes the most time when training a DNN. However, having more training data cannot really hurt performance.
|Dec-13-17|| ||ahmadov: Another proof of the fact that chess is not exhausted.|
|Dec-13-17|| ||dannygjk: SF still manages it's time well at various time controls. By the way time management is a non-factor at fixed time/move controls.
The book-Even when Stockfish followed theory in these published games AZ outplayed SF after the opening.
EGTB-Stockfish had lost positions before the EGTB would be of any use.
Hash transposition size-Try it yourself give SF a big hash and see how long it takes SF to see that AZ's sacs were sound.
Based on what I have seen in the published games my theory why AZ outplayed SF is that AZ has vastly superior move ordering. This is supported by the fact that SF was doing 70,000,000 nps while AZ was doing only 80,000 nps. Even if SF has a huge transposition hash table that won't be enough to compensate for much inferior move ordering. Inferior move ordering results in too much time wasted on pointless variations. SF will miss crucial variations because of that.|
< Earlier Kibitzing · PAGE 5 OF 5 ·
Spot an error? Please suggest your correction and help us eliminate database mistakes!
NOTE: You need to pick a username and password to post a reply.
Getting your account takes less than a minute, totally anonymous,
and 100% free--plus, it
entitles you to features otherwise unavailable.
Pick your username now and join the chessgames community!
If you already have an account, you should
Please observe our posting guidelines:
- No obscene, racist, sexist, or profane language.
- No spamming, advertising, or duplicating posts.
- No personal attacks against other members.
- Nothing in violation of United States law.
- No posting personal information of members.
See something that violates our rules? Blow the whistle and inform an administrator.
NOTE: Keep all discussion on the topic of this page.
This forum is for this specific tournament and nothing else. If you want to discuss chess in general, or
this site, you might try the Kibitzer's Café.
posted by Chessgames members do not necessarily represent the views of Chessgames.com, its employees, or sponsors.|
your profile |
Premium Membership |
Kibitzer's Café |
Biographer's Bistro |
new kibitzing |
Tournament Index |
Player Directory |
Notable Games |
World Chess Championships |
Opening Explorer |
Guess the Move |
Game Collections |
ChessBookie Game |
Chessgames Challenge |
privacy notice |
Copyright 2001-2017, Chessgames Services LLC