< Earlier Kibitzing · PAGE 7 OF 7 ·
|Apr-15-18|| ||morfishine: Cyber World, What is it, really?
|Apr-15-18|| ||ChessHigherCat: <morf> To be sung to the tune of https://www.youtube.com/watch?v=ztZ...? |
Anyway, you forgot the one.
One + Nothing = Alpha Zero
|Apr-15-18|| ||AylerKupp: <<SChesshevsky> So does AlphaZero just take whatever principles or concepts it learned, assess the position based on these, then calculate possibilities, then put a number on the results just like other computer programs do? Or does it have some totally different method for picking a continuation? How exactly does AlphaZero choose its moves?>|
The way that AlphaZero chooses its moves is summarized in the original paper by the Deep Mind team, https://arxiv.org/pdf/1712.01815.pdf., so it's hardly a secret, although (of course!) many of the details are left out. It goes as follows:
(1) For every position being evaluated AlphaZero calculates a set of possible candidate moves, each with a probability of the move being played and the expected result from that position following each candidate move based on a series of simulated games played by AlphaZero. The expected result is determined by the position evaluation parameters it discovered over its training phase. The results go into its search tree.
(2) AlphaZero then searches its search tree in the minimax sense to determine the sequence of moves that maximizes the desired result; i.e. the highest expected gain from playing that move. The move with the highest expected gain is the move played.
This approach differs from a "conventional" chess engine like Stockfish, Houdini, Komodo, etc. in two main ways:
(1) In a conventional engine the factors and weights that go into its evaluation function are determined by its developers based on their knowledge of chess (many engine developers have GMs in their staff) and calibrated based on the results of engine vs. engine competition; sometimes with other engines but most often among the same or different versions of the same engine. The evaluation function differs from engine to engine. In AlphaZero the evaluation is based on the results of playing several simulated games from that position and is based on the probability that the desired result (i.e. a win) is reached as a result of these games.
(2) A conventional search engine uses alpha-beta pruning plus heuristics to prune its search tree in order to discard less promising branches of its search tree and reduce the number of possible positions that need to be evaluated. The alpha-beta pruning algorithm is well known and has been in use for a long time, and the heuristics are once again based on the chess knowledge of the engine developers and differs from engine to engine.
In contrast, AlphaZero uses a Monte-Carlo tree search (MCTS) algorithm that calculates the probability distribution of the success of each candidate move; the move with the highest peak in its probability definition is the move selected to be played.
Here is a video by Deep Mind's Dr. David Silver, the leader of Deep Mind's reinforcement learning research group, of how AlphaGo, AlphaGoZero, and AlphaZero work in general. So don't take my word for how AlphaZero works, take his. About 43.5 minutes.
|Apr-15-18|| ||SChesshevsky: <AylerKupp> Thanks for the very informative answer. |
Based on the 2 ways AlphaZero is different than standard computer chess engines, it would seem that AlphaZero does basically take its entire history of games played and picks a continuation with what seemed to have worked before, then narrows based on probability.
Since AlphaZero doesn't appear to use any chess principle applications at all, I'm assuming that it would have to be able to reference a extremely large database to cover every eventuality.
For instance, say to save space you eliminate all games that started a4 and b4 with White figuring you're never going to see those. Would a data space saving AlphaZero just freeze if faced with a4 then b4?
So then assuming that AlphaZero would need every game it has played on hand somewhere (and I believe that it's billions of games) and probably very large computing power needed to parse the collection quickly and often, that might appear to make it unlikely to be very practical for general human use or maybe even computer versus computer due to its significant hardware requirements. Any way around this for AlphaZero?
It also looks likely that far from playing more human like or more intelligently, AlphaZero plays just with more computer brute strength. In fact, it could be with unlimited processing speed and database memory, AlphaZero wouldn't even have to "learn" chess.
Given it's move choice process, couldn't AlphaZero with enough power and memory theoretically just, move by move, play every combination and narrow down the one's that work and don't work from game one -move one over the board?
And if it could do that it could even wipe out the memory from game one and use the same process for game two, three, etc.
Wouldn't the result be a machine totally ignorant of chess (except of the basic rules), yet playing a good, likely a very good chess game by simply running through every scenario and choosing the continuation that statistically works the best, and then after the game going back to being totally ignorant of chess?
If that's the case, the chess program might play a great game but appears far from human like or intelligent.
|Apr-15-18|| ||AylerKupp: <<SChessevsky> Since AlphaZero doesn't appear to use any chess principle applications at all, I'm assuming that it would have to be able to reference a extremely large database to cover every eventuality.>|
No! That's the beauty of it. It is self-trained, it uses no database and no information about chess games other than the rules of chess. It discovers what works and what doesn't work by playing games against itself, initially effectively making random moves and seeing what works and what doesn't work. After a while, it collects the knowledge it has acquired in its neural network to play ever-improving chess. And this information is retained so that it can benefit from its acquired knowledge in subsequent games; it certainly does not "forget" everything that it learned when it plays a new game.
To me that's the most significant part. Other neural network-based chess engines learned by being fed human master-level games and, based on these, learned what works and what doesn't. But to me that means that they also learn human "prejudices" and so are unlikely to discover new concepts.
But, if the neural network is not constrained to learn from what human players have played, then it is free to discover new approaches and principles that had been used before. Apparently this has happened with AlphaGo Zero; in its matches against top level Go players it played moves based on concepts that apparently had not been discovered in the last 1000 years.
The reason that AlphaZero needs specialized hardware is not so much the playing part but its learning part. In a neural network its nodes are based on perceptrons, a model of the human neurons. In a neural network like AlphaZero's, its perceptrons are assigned weights (importance of factors) based on iterative results with an algorithm called backpropagation. Backpropagation is computationally very intensive, and Google's proprietary Tensor Processing Units (TPUs) were designed to speed up the operations involved in algorithms such as backpropagation. According to their paper, Deep Mind used 5,000 first generation TPUs to generate the self-play games used in its training and 64 second generation TPUs to train the neural network. In its match with Stockfish it used only 4 TPUs (presumably second generation), so that shows the relative computational power needed to train AlphaZero compared to what was needed to actually play games against Stockfish.
But the reason that I'm not as convinced about AlphaZero's superiority over Stockfish is that I estimated the computing power of those 4 TPU (per Google's data about them) to be over 100X the computing power of the hardware that was used by Deep Mind to run Stockfish. If Stockfish had 100X the computing power that it had when it played the match, and it wasn't crippled by disabling its opening book and its tablebase support, the results could very well have been completely different. We'll just never know.
|Apr-15-18|| ||SChesshevsky: <AylerKrupp...It discovers what works and what doesn't work by playing games against itself, initially effectively making random moves and seeing what works and what doesn't work. After a while, it collects the knowledge it has acquired in its neural network to play ever-improving chess...>|
Thanks again for the information. This is what I'm confused about. What is the knowledge that AlphaZero acquired?
If its knowledge is just basically chess principles that it learned by playing through the billions of games it played. Seemingly, most of or all of the same ones humans currently know.
Then the only apparent difference between it and current chess engines is that it may have learned some new chess principles. I guess that may have happened though it didn't seem apparent from the games presented.
This is what I don't get. If AlphaZero isn't much different from current engines from a practical perspective. And, in fact, possibly being weaker or stronger. What's the big deal then?
Unfortunately, AZ's presented games do little to clarify if or how its capabilities differ. The poor play by Stockfish seemed to make the assessment of AZ being anything more than a very, very good chess player impossible.
If, with all the experience and effort AZ and its people went through, the program really didn't find any new principles in chess, couldn't it be viewed more a disappointment than an achievement? I mean the goal should be new knowledge.
Worse, publicizing it's victories over Stockfish versus acknowledging its discoveries or lack of discoveries in chess seems at a minimum misleading.
While the ability for AZ to play billions of games in a relatively short time and derive excellent principles is impressive. It seems impressive mostly from a technological point of view. I'd say unless we have real tests of AZ's knowledge, we don't know whether its AI is very intelligent, moderately intelligent or not that intelligent at all.
For me, I'm kind of disappointed over the whole AZ experiment.
|Apr-16-18|| ||AylerKupp: <SChesshevky> The knowledge that AlphaZero acquired is pretty much what you said (or maybe what I said); what works and what doesn't work. And, yes, most of it is probably the same ones humans learned after several hundred years of playing modern chess, plus some new ones. But neural network-based software is not amenable to communicating what it has learned in terms that us humans understand, and even it's developers say that they don't really know how it works (or, more accurately, what it does what it does). |
Which is unfortunate. Another branch of AI, expert systems, worked by accepting from humans decision-making rules and the conditions under which those rules applied, and then it was up to the software to determine which rules applied and which didn't, based on its inputs. They were all the rage many years ago although you don't hear much about them recently. But one of the things they excelled at was not only reaching conclusions but telling you <why> they reached that conclusion, based on the conditions they encountered and the rules they used
I have thought for a while that current chess engines, with their explicitly programmed rules on evaluating a position and their explicitly programmed rules for how they pruned their search trees (i.e. what branches of the search tree they discarded and why), could be modified to tell us in human-understandable form <why> they selected their top moves. What a great chess training tool that would be! But, alas, no chess engine developer has produced such an add-on.
But what is important in AlphaZero's approach and others like it is not <what> they accomplished but <how> they accomplished. The main obstacle to both expert systems and current chess engines (dare I call them "classical" chess engines?) is capturing the knowledge of experts in their particular domain. But if techniques such as those used in AlphaZero can accomplish similar results without much human expertise input, then the future seems very promising. And, even if they didn't find any new knowledge, I don't think that the results could be called disappointing. Remember, this self-learning approach is in its relative infancy and can likely be substantially improved, particularly as ever more powerful computing power at a reasonable cost becomes available.
I can also say that, to some extent, I'm <partly> disappointed in the AlphaZero experiment, but more along the ways that it was conducted and how it was presented. I will grant that disabling Stockfish's opening book and tablebase support has nothing to do (well, not much) on how it determines what moves to play based on its evaluation of the position, but doing so seems to me that the developers had an end result in mind and wanted to use whatever means to achieve it, including far more powerful computations hardware. Better (in my opinion) to have AlphaZero and a full-featured Stockfish run in comparably computationally powerful hardware and accept whatever the results would be. Even if AlphaZero lost such a match against Stockfish it would demonstrate how much progress had been accomplished in the field since previous neural network-based chess engines like Giraffe had played at the human master level and no better.
But that's the engineer in me talking and I'm not a marketeer. I realize that in order to obtain additional resources to continue the work one must produce seemingly impressive results, even if those results were obtained with loaded dice or rigged roulette wheels. That's just real life.
Apropos AlphaZero, I came across this article, https://web.stanford.edu/~surag/pos..., called "A Simple Alpha(Go) Zero Tutorial. Don't let the "Simple" in the title fool you, it's somewhat challenging to follow, particularly if you are mathematically challenged. But you can get the gist of the information by reading the text without trying to decipher the equations. And I love it when the author says "And that's it! Somewhat magically, the network improves almost every iteration and learns to play the game better.". Which is, however, the crux of the approach. And it brings to mind Arthur C. Clarke's 3rd law that "Any sufficiently advanced technology is indistinguishable from magic."
|Apr-16-18|| ||theagenbiteofinwit: We have no proof that AlphaZero won't just go ahead and resign if you play 1.d4 against it.|
|Apr-16-18|| ||morfishine: <theagenbiteofinwit: We have no proof that AlphaZero won't just go ahead and resign if you play 1.d4 against it> lol|
A learning algorithm or an actual chess engine never gives up
|Apr-17-18|| ||alexmagnus: An old version of Chessmaster, which I still have somewhere in my cellar, always hung up (just hung up, didn't resign) if you played 1. e4 e6 2. d4 d5 3. Bb5+ against it. IIRC the crash happened in both the full version and the "personalities". I still wonder what caused it.|
|Apr-17-18|| ||WorstPlayerEver: The whole story 'it learnt itself' is a hoax. Otherwise it would determine a difference between -for instance- 1. e4 and 1. d4|
But A0 plays both 1. e4 and 1. d4 which make no sense if it figured out how to play openings all by itself.
Furthermore I'd like to point out chess is a different game than go or draughts. Two games which are comparable because the move options are pretty limited.
|Apr-17-18|| ||alexmagnus: WPE, neural networks don't always play the same just because they learned by themselves. I once wrote a neural network that taught itself to play a simplified version of tetris. Did it play always the same game after the training finished? No! |
And limited number of moves in go?? Go has actually a higher branching factor than chess, which is why pre-AlphaGo there was no decent Go engine.
|Apr-17-18|| ||WorstPlayerEver: <alexmagnus>
Yes, even SF can end up in completely different variations from a given position.
Therefore I question the approach; there must be a formula which covers the game. That formula must determine which subformula has priority in a given position. Or how the subformulae compare as it comes to finding the best -logical- move in a position.
I'm pretty sure calculations based on 'probability' or 'evaluation of position and strength of the pieces etc.' will cover things only til so far.
|Apr-17-18|| ||alexmagnus: To teach a neural network you don't need to give it anything, no formula, nothing. In some games not even the rules (in chess it's possible too, but who wants to waste resources on it learning the rules by itself?). |
The only things it needs is kind of "reward" for each win and "punishment" for each loss. And that is it. In case of chess (my tetris engine rewarded itself depending on the achieved score).
|Apr-17-18|| ||WorstPlayerEver: <alexmagnus>
Hmm.. I really think it's about basic patterns and how find the right combination between those patterns.
It has to be something fairly simple as it comes it math IMO. It's just we can't figure out a logical connection between the elements which make up chess.
|Apr-17-18|| ||AylerKupp: <<WorstPlayerEver> The whole story 'it learnt itself' is a hoax. Otherwise it would determine a difference between -for instance- 1. e4 and 1. d4>|
Unless you're trying to be funny, you should read a little bit about how neural networks learn before you make statements like that, even in jest. And as far as chess engines playing different moves, since their original days they have always varied their openings, both with and without using opening books.
And modern multi-core chess engines are notoriously non-deterministic. If you were to analyze the same position, on the same computer, with the same chess engine, and run the analysis to the same search depth, successive analyses will yield different evaluations, different "best" moves, and different lines. Not <may>, <will>. Guaranteed. If you don't believe me, just try it yourself. And the calculations in a chess engine, other than possibly its search tree pruning heuristics, have nothing to do with probability.
<Furthermore I'd like to point out chess is a different game than go or draughts. Two games which are comparable because the move options are pretty limited.>
I can't see how you can possibly say that go and draughts are comparable because the move options are pretty limited. For example, the number of possible moves in the initial chess position is 20, in the initial draughts position is 9, and the initial go position is 361. And there are many measures of game complexity; see, for example, https://en.wikipedia.org/wiki/Game_.... And by all measures Complexity( draughts ) < Complexity(chess) < Complexity( go). That's why a computer program (Chinook) defeated the world's best checker player, Marion Tinsley in 1994 (although it was by default when the first 6 games of their match were drawn, Tinsley had to withdraw as a result of pancreatic cancer and died a few months later) and solved in 2007 in the sense that the result of a game can be predicted from any position. I suspect we all know what happened in 1997 in the chess match between Garry Kasparov and Deep Blue, and in 2016 in the Go match between Lee Sedol and AlphaGo. And neither chess nor Go have been solved in the sense listed above, although most believe that if it were ever to happen, chess would be solved first because its complexity is less than Go's.
|Apr-17-18|| ||AylerKupp: <<alexmagnus> The only things it needs is kind of "reward" for each win and "punishment" for each loss.>|
Actually, as I've recently found out as a result of discussions with <zborris8> above and additional reading, with neural nets that are completely unsupervised (i.e. no reward), the networks are not given a goal, they process their inputs and try to find relationships between the data. So initially you don't even know what it is that you're trying to figure out. Sort of like the X-files where "The truth is out there". Except that in the case of unsupervised neural nets, "The truth is in there".
I suspect that you probably don't need it but, for those who are interested and not aware of the differences between supervised, unsupervised, and reinforcement training (I needed it!), here is a short article on the subject: https://medium.com/@machadogj/ml-ba.... And there are additional categories of machine learning, although from my limited reading I suspect that these are the main ones. See, for example, https://www.youtube.com/watch?v=NKp....
|Apr-17-18|| ||AylerKupp: <<WorstPlayerEver> Hmm.. I really think it's about basic patterns and how find the right combination between those patterns.>|
Congratulations. You have just "discovered" unsupervised machine learning. See the first link in my response to <alexmagnus> above.
|Apr-18-18|| ||Jambow: One thing that becomes apparent is that A0 likes the bishop, and often the bishop and rook combination. Not sure I have ever read anything describing the power of this duo. It may be against other engines this isn't the case and we might be seeing an anomaly because of Stockfish as AlphaZero wasn't programmed for anything but the rules of chess by carbon based units. If you jump to the final positions in only 1 game is there not a Bishop left on the board. Never is a queen present at the end either. I would like to see if this pattern persists against other engines so we could determine is there some superiority not understood as of yet, or is this simply the best resources against Stockfish. Not sure I made this observation in isolation or if it has any value whatsoever but there it is.|
|Apr-18-18|| ||WorstPlayerEver: <AylerKupp>
Sorry, I expressed myself somewhat clumsy. What I meant is go and draughts only have one piece. Chess has 6 different pieces. As in draughts chess also has the option of promotion.
|Apr-18-18|| ||AylerKupp: <<WorstPlayerEver> Sorry, I expressed myself somewhat clumsy.>|
That's OK, I usually express myself clumsily also. Or at least that's what other posters keep telling me. :-)
One beneficial side effect in trying to respond to posts like yours with facts (which is what I try to always do, although I don't always suceed) is that it motivates me to do some research. And, as a result, I learned about various game complexity measures, which was interesting and informative. Another site I ran into (to which I did not post a link to) was a site which indicated which games have been completely solved by computers, in the sense of being able to predict the outcome from any position. Which seems to be a reasonable definition of "solved" as it applies to games.
So I hope that you and others keep "motivating" me. The down side is that you have (well, you don't <have> to) read my clumsily expressed ideas in my posts.
|Apr-19-18|| ||WorstPlayerEver: <AylerKupp>
I can't speak for other -complex- games, but every time I have to read about go, how many possible positions it has.
But no one ever mention even the best chess players can miss a mate in two sometimes. I guess this counts for draughts and go as well?
|Apr-19-18|| ||nok: Open a page for Leela, cg.
I beat her a week ago but she got better it seems.
|Apr-19-18|| ||WorstPlayerEver: <nok>
Thanks! Going to give it a try :)
|Apr-19-18|| ||AylerKupp: <WorstPlayerEver> But no one ever mention even the best chess players can miss a mate in two sometimes. I guess this counts for draughts and go as well?>|
Oh, it's mentioned occasionally. And the best chess players sometimes even miss mates in one, particularly (and understandably) in time pressure. Here's a famous missed mate in two in Szabo vs Reshevsky, 1953 after 20...Bxf6:
click for larger view
White wins with 21.Qxg6+ and then either 21...Kh8 22.Bxf6# or 21...Bg7 21.Qxg7#.
When this was pointed out to Szabo after the game he supposedly said "Of course I missed it! You don't look for a mate in two against Reshevsky!". And maybe that's the reason top players miss mates in two or even one, one doesn't look for something that they can't imagine being there.
I can't speak for go or draughts either, but I suspect that something similar might apply. After all, at least for now, we are all human.
And even chess engines are not immune, although in that case it's usually due to the horizon effect. I was once looking at a line where a chess engine evaluated the position as slightly favorable for White even though Black had a mate in one next move. The engine had reached the end of its search tree so it was 100% blind to what could happen afterwards. Unfortunately, I didn't save the position.
But I did save this one. I was analyzing a position that could have been reached in Team White vs Team Black, 2015
Komodo 9.2, the strongest chess engine at the time, produced the following evaluation of its 3rd best line: [+0.16], d=30: 27...Ra7 28.Be1 Qd7 29.Qd3 Rg7 30.Kh1 Kh8 31.Bf1 Ne8 32.Ne2 Qc8 33.Qb3 Bh4 34.Bc3 gxf3 35.gxf3 Bf2 36.a6 Rxa6 37.Rxb7 Rxb7 38.Qxb7 Qxb7 39.Rxb7 Ra8 40.Bh3 Nh4 41.Nxf4 Bd4 42.Ne2 Bxc3 43.Nxc3 Nxf3 44.Bf5 Nf6 45.Nb5, reaching the following position:
click for larger view
Komodo had reached the limits of its search tree and after it played 45.Nb5 it had no idea of the potential of Black's position. And here Black has a forced mate in 5 after 45...Ra2 (a quiet move so a quiescent search extension was not likely to be used, but now mate is unavoidable) 46.Rb8+ Kg7 47.Rb7+ Kh6 48.Rh7+ (an attempt to delay the inevitable, another example of the horizon effect at work) 48...Nxh7 49.Bxh7 Rxh2#
click for larger view
I don't know if AlphaZero is susceptible to the horizon effect since it bases its position evaluations on the statistical results of simulated games played to completion. If that's the case, then that's another significant achievement for it.
< Earlier Kibitzing · PAGE 7 OF 7 ·