< Earlier Kibitzing · PAGE 57 OF 57 ·
Later Kibitzing> 
Apr0619
  AylerKupp: <Personal stuff – continued> <There's a psychological approach which I very much agree with, it's called  I'm OK, you are OK. > I remember that book. But sometimes someone <is> right and someone <is> wrong, and it's important to determine which one is what. Hopefully it can be done constructively and with civility, but it must be done regardless. But there should not be any reason to get personal about it. The 4 interactions you mentioned reminded me of a chart that I used to present at kickoff meetings for new projects. It was titled "If you do what I say" and it went something like this: 1. If you do what I say and it turns out well, we'll both be praised for our good judgment. 2. If you do what I say and it doesn't turn out well, I'll be blamed for my poor judgment. 3. If you don't do what I say and it turns out well, you'll be praised for your good judgment. 4. If you don't do what I say and it doesn't turn out well, you're screwed. <I've never considered ignoring you as I've learned from your posts. I think you add a lot to many discussions and topics. I apologize if I gave the wrong impression, I'd like to restate that we have agreed on most of points we've discussed and that I respect your point of view on the things we don't. In any case, thanks for the debate.> Again, suggesting that people put me on their ignore list is another way of trying to terminate what I consider a less than productive discussion and move on to other things. Besides, since the number of people you can put in your ignore list is limited, I look forward to the quandary those people will eventually face if they eventually try to add someone else to their ignore list and find out that in order to do so they must delete someone from it. Decisions, decisions. :) And no, I don't think that you gave me the wrong impression. The main point that I was trying to make was that there are times when you want to know the best implementation (hardware + software) of a solution to a problem and there are times when all you want to know is the best software implementation. That's all. And after all, our motivations and interest for preferring one over the other may change in the near future; "Indecision is the key to flexibility". (same story as the dead cat's story). 

Apr0819   MrMelad: <AylerKupp> I appreciate the time you invest in your comments and also your positive thinking and optimism. I value your many contributions in this site on many topics. Regarding our discussion in the AlphaZero page I think it has reached a point where both of us made our best case and we should either wait for others to chip in or just let it be. 

Apr1519
  AylerKupp: <<BOSTER> What is the difference between usual tree Search for engine (Kotov) and MonteCarlo Tree Search, based for the AZ?> First let's make sure we're talking about the same thing. Here is what I think is a summary of the differences between search tree expansion and pruning in both classic engines (per Shannon's original 1949 paper "Programming a Computer for Playing Chess" (https://vision.unipv.it/IA1/aa2009... ) as used by Stockfish, Komodo, etc. and an MCTSlike search tree expansion and pruning as used by AlphaZero ("Mastering Chess and Shogi by SelfPlay with a General Reinforcement Learning Algorithm", ((https://arxiv.org/pdf/1712.01815.pdf (not much on MCTS), "A general reinforcement learning algorithm that masters chess, shogi and Go through selfplay" https://deepmind.com/documents/260/... (better), and the "Game Changer" book (the easiest to follow, but ...) by Mathew Sadler and Natasha Regan) and "Technical Explanation of Leela Chess Zero" by Andy Olsen, https://github.com/LeelaChessZero/l... ). Both AlphaZero articles were written by members of the DeepMind team and they assisted the authors of "Game Changer". Be aware that there are contradictions on how MCTS is implemented in AlphaZero (if at all!) in all 3 papers and the book, so I'm not really sure how AlphaZero implements MCTS or its apparently close derivative, PUCT. I'm aiming to highlight the similarities between the two approaches (of which, at a high level, there are many) and the differences, although of course there are many details I've left out, mainly because, sadly, I don't know them. :( Part 1 of 3: <Classic> 1. Start with the current game position (the search tree root) and identify all legal candidate next moves. 2. Select the "most promising" candidate next moves based on heuristics (which differ from engine to engine). The heuristics represent "educated guesses" as to which moves will be the roots of the branches of the search tree that are most likely to contain the best moves. 3. Evaluate the resulting position for the most promising next moves by means of a hand crafted evaluation function which returns an evaluation function based on centipawns. 4. Expand the tree by repeating the process for all the most promising candidate moves at the next level, an using the alphabeta pruning algorithm to delete from the search tree when at least one move is found to be worse than a previously examined move; such moves need not be evaluated further. I call alphabeta pruning an algorithm because when applied to a standard minimax tree, it returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision. 5. Ignoring quiescent search and extensions (which continue expansion of the subtree for a particular move if a response is either literally forced (such as a response to a check) or practically forced (if a recapture is needed to avoid overwhelming material disadvantage), continue to expand the tree until all the most promising moves have been evaluated. 6. Propagate the results upwards to the original root of the tree by selecting the move that, at each level, results in the alternating best (maximum) and worst (minimum) evaluation along each branch of the root tree (hence the name minimax). The intent is to select the branch of the search tree (the Principal Variation) that contains the best moves for each player at each level of the root tree. This assumes that each player will, in turn, select the move that results in the most advantageous position for them. 7. Repeat the process by increasing the depth of the search tree by one ply for the best moves (typically a small number, in the order of 5 or so) selected at each ply until the time management function of the chess engine indicates that it's time to make a move. 

Apr1519
  AylerKupp: Part 2 of 3 <MCTS (and it's close relative, PUCT)>: 1. Start with the current game position and identify all legal candidate next moves. 2. Select the "most promising" candidate next moves based on the training of its neural network. 3. "Evaluate" the resulting position by conducting a series of game simulations (playouts) until a result (win, draw, loss) and determine the scoring percentage (no. of wins + number of losses/2) of that move as well as other statistics such as number of wins, probability that the move being examined will be selected (based on the training of its neural network). At least that's the traditional definition of MCTS. The "A general reinforcement training ...", the "Mastering Shogi by Self Play ..." articles indicate that they obtain the game result probabilities by doing playouts, the "Technical Explanation of Leela Chess Zero" article indicates that AlphaZero (as well as Leela Chess Zero) uses the PUCT algorithm, and "Game Changer" explains the process reasonably well but never actually says that it uses MCTS and is not clear how the game result probabilities are obtained, other than "the percentage represents the prior move probability" without saying how that's calculated. Then again the focus of the book is to provide examples of AlphaZero's play and not to fully describe how AlphaZero works. But all articles and the book basically refer to an MCTStime tree generation and branch evaluation and selection. To make matters worse, MCTSMinimax hybrids have been developed (see "MCTSMinimax Hybrids with State Evaluations" and "MCTSMinimax Hybrids with State Evaluations (Extended Abstract"; you must Google them and download them directly) which, of course, claims that combining the two methods yields the best results, at least in some domains. And, also of course, there seems to be some controversy on the suitability of "pure" MCTS for chess applications (see http://talkchess.com/forum3/viewtop...). 4. Expand the tree by repeating the process until a position is found that has not been found and evaluated (examined) before, or a position that ends the game. 6. Propagate the results upwards to the original root of the tree and recalculating the scoring percentage and the other statistics for each branch of the tree. 7. Repeat the process by increasing the depth of the subtree of the move being examined by one ply for those moves in the search tree (again, probably a small number, but I don't know the range of that number) that have the highest expected scoring percentage. Again, the engine's time management function of the chess engine determines when it's time to make a move. 

Apr1519
  AylerKupp: Part 3 of 3 <Pros and Cons of each>: Whenever you have 2 approaches you typically have pros and cons for each. In anticipation of your possible next question, here are some pros and cons to either the classic or MCTSbased approaches to tree searching. Most of these pros/cons are from published literature, some are my opinions and observations. The latter may not interest you. :) 1. Some claim, as stated above, that MCTS is not that well suited for chess applications, contrary to the views of the DeepMind team and Matthew Sadler. 2. MCTS is supposed to provide a "more human" approach as far as best move selection because it immediately narrows down the best move candidates to consider and reduces these to a small number, just like a GM might do. Mathew Sadler says that "An engine such as Stockfish works on the basis that the evaluation of the best move determines the evaluation of the position" which is horrendously incorrect. Classis chess engines work on the basis that the evaluation of the best <branch> (in the minimax sense) determines the best of the move candidates considered. How to determine the best of the move candidates in a classic chess engine is determined by the search heuristics. 3. Minimax will provide the best answer in terms of what moves to play in a zerosum game, MCTS will only approximate the best answer. But that apparently is good enough, at least under the conditions under which engine vs. engine chess games are played. 4. (my opinion) A classic chess engine using minimax will always be faced with the horizon effect no mater how deep it searches (see a tongueincheek description of "AylerKupp's corollary to Murphy's Law (AKC2ML) in my header above). Based on the example in "Game Changer" MCTS <seems> to be able to reach deeper search depths in the same amount of time than a classic engine because it is apparently more efficient in identifying the best candidate moves and so it can better control the width of its search tree, narrowing the number of branches it needs to consider in order to dedicate its resources analyzing the most promising moves to get good results. It's unfortunate that I haven't been able to find any sources that list the calculation time and search depths achieved by AlphaZero for each move. But, like Stockfish's aggressive search tree pruning, this is double edged. Yes, the engine can search deeper but it is also more likely to miss the best moves by both sides. Sadler says that "I think this also explains how AlphaZero might occasionally miss an unusual, 'unfair' tactic in a position. Since AlphaZero is pruning possibilities to consider so early and rigorously, it might discount a nonstandard move before it could examine it at the depth required to see its hidden strengths. 5.(my opinion) But perhaps the best indication of the relative merits of minimax with heuristics and alphabeta pruning vs. MCTS is provided by Komodo 12.x. In addition to its standard classic version, it provides an option to use MCTS instead of minimax, etc. and both engine versions have competed in the TCEC, CCRL, and CEGT engine vs. engine tournaments. In all cases Komodo 12.x standard performed better than the corresponding Komodo 12.x MCTS and achieved higher ratings. But, in fairness, the Komodo 12.x MCTS option is relatively new, likely does not perform as efficiently as the standard version (which has been available for some time in Komodo multicore) and even crashed 3 times in the most recent TCEC tournament. So we might not yet be comparing apples to apples in terms of performance, and the CCRL ratings for Komodo 12.x MCTS version have been increasing faster than the Komodo 12.x standard version. But, when all engine components are basically the same, Komodo 12.x with minimax consistently outperforms Komodo 12.x with MCTS. Hopefully the above answers at least some of your questions. As usual, there is a lot of information out there in case you want to dig deeper. 

Apr1519
  AylerKupp: <<MrMelad> Regarding our discussion in the AlphaZero page I think it has reached a point where both of us made our best case and we should either wait for others to chip in or just let it be.> I agree, but there are still some things I want to say to clear up some misconceptions based on your last series of posts and offer yet another example of what I've been trying to say. <<MrMelad> I'm trying to offset some of your claims as they seem to focus around diminishing and discrediting the accomplishments of AlphaZero and Leela.> I am in <NO WAY> trying to discredit the accomplishments of AlphaZero and Leela Chess Zero. I think (and have said) that AlphaZero deserves a huge amount of credit for their ability to generalize the implementation of neural networkbased game playing engines by reducing the amount of domain specific (i.e. game rules) needed to implement the engine. And Leela Chess Zero showed how neural network based training could be implemented by distributing the task across a network and tapping on the resources of individuals connected to that network. Similar, in a way, how the available computer resources of many individuals were tapped to support SETI. But I am trying to put things in perspective and correct some of the claims made by the more enthusiastic and apparently ignorant posters out there. First, there is <nothing> original in the algorithms used in AlphaZero; not the use of neural networks for chess playing, the use of reinforcement training of the neural network, or use of MCTS. All of these algorithms have been applied to chess playing before. I'm sure that the AlphaZero developers implemented many enhancements and improvements in these algorithms. But original, no. What I do think that AlphaZero accomplished is the best <integration> of these algorithms into a chess playing <system> (hardware + software), and as someone who has been responsible for system integration on many projects I know and appreciate how hard and unpredictable this can be. That and the pioneering work of efficiently using TPUs and their massive computational performance advantage to implement the best chess playing <system>. 

Apr1519
  AylerKupp: <<MrMelad> You use those arguments to "warn" people from giving too much credit to AlphaZero as if the competition between stockfish and AlphaZero was between two similar algorithms that one simply had a huge computational advantage.> Well, I am trying to make people aware that the results of the AlphaZero and Stockfish matches are inconclusive at best <IF> what you are trying to find out what's the best approach and algorithms to implement the best chess playing engines. Whether the competition was with similar or dissimilar algorithms is besides the point, as long as the two engines are running on hardware with similar computational performance capability, regardless of hardware architectures. If that constitutes a "warning", then so be it. Let me try a different approach to try to convince you of that. Suppose there a 100game match was held between AlphaZero and Leela Chess Zero, both of which have similar (though not identical) algorithms and architectures and, of course, likely different implementations of those algorithms. If AlphaZero was restricted to use only one 1st generation TPU (performance estimated at ~30 TFlops) and Leela Chess Zero used a GPU server configuration with two nVIDEA RTX 2080i GPUs, (performance estimated at ~13.7 TFlops each, with an aggregate performance estimated at ~27.4 TFlops) then I would say that the performance capability of their hardware was similar. If results of the match were a near tie (like the previous TCEC's Leela Chess Zero vs. Stockfish Superfinal, even though Leela Chess Zero had a substantial computing capability over Stockfish), then I would conclude that the performance of the algorithms in AlphaZero and Leela Chess Zero was also approximately equal. I think that you would probably agree. Now conduct another 100match except allow AlphaZero to use four 3rd generation TPUs, each with a performance estimated at ~ 360 TFlops for an aggregate performance capability of ~1,440 TFlops and a performance computational advantage of ~ 52.5X over Leela Chess Zero. If the first match with hardware of comparable performance capability ended in a neartie, do you have any doubts as to which engine would win the second match? And if the winner was AlphaZero by a substantial margin (of which I have no doubt), would you then conclude that the AlphaZero <algorithms> were substantially better than Leela Chess Zero's <algorithms>? I hope not. And, if you were to agree that the results of this second match were inconclusive because of AlphaZero's substantial computational capability, would you also have agreed if the first match with two approximately equal hardware in terms of computational performance capability had not taken place? So yes, I "warn" people against giving too much credit to AlphaZero as a result of its matches against Stockfish when AlphaZero enjoyed a substantial computational performance advantage. Particularly since both DeepMind's data and Leela Chess Zero's experience with a shorter time/move (AlphaZero) and without GPU support (Leela Chess Zero) show that Stockfish, as well as many other classic engines, would defeat them both convincingly if their computational capabilities were similar. 

Apr1519
  AylerKupp: <<MrMelad> <I don't think your intentions are malicious though, I just don't think you understand how AlphaZero works, i.e., how reinforcement learning or deep learning works.> My intentions are certainly not malicious, at least not intentionally. I present my opinions and provide data or links to data to support them. What else can I do? If others disagree with my opinions even though they agree on the data presented, or if they choose to ignore the data, there's nothing I can do about that. As far as not understanding how Alpha Zero, reinforcement learning, or deep learning works, I might surprise you. I have read every paper on Alpha Zero, Leela Chess Zero, neural networks, reinforcement learning, and deep learning that I have been able to find and read, and I have bought two books on neural network design and deep learning that I am now studying. An expert on those subjects? Clearly not, and I have a long way to go to even be considered "knowledgeable". But understanding? I think that I have a highlevel understanding, enough I think to filter bs and wishful thinking from facts. <<MrMelad> <I know sarcasm doesn't always translate well on the internet but I sometimes forget. My apologies for using it, it's not that funny anyways.> Well, I have the same problem and I try to avoid it but, as you know, it's easy to backslide. And quips that somehow seem funny at the time that I write and post them typically don't seem so funny after a time and after others read them. My post was not intended as a criticism of using sarcasm but just a statement of fact that, unless the use of sarcasm if very, very obvious, it's not an effective way to try to get a point across. <<MrMelad> Here is the part where he says "You can't really compare CPU cores to GPU cores apples to apples"> What he actually said starting at ~ 00:07:00 was that CPUs and GPUs can't be compared because "they are <qualitatively different>" and the much larger number of available cores in the GPUs means that they can effectively perform a larger number of more limited operations in parallel (no surprise here!). So <CPU cores> cannot be compared to <GPU cores> <because of the kind of things that they can each do most effectively>. That's where the "apples to apples" comment was addressing, not in the context that their computational capabilities can't be compared. 

Apr1719   MrMelad: <AylerKupp> thanks for you comments. I responded in the AlphaZero page. 

May1619
  diceman: <AylerKupp:
7. Repeat the process by increasing the depth of the subtree of the move being examined by one ply for those moves in the search tree (again, probably a small number, but I don't know the range of that number) that have the highest expected scoring percentage. Again, the engine's time management function of the chess engine determines when it's time to make a move.> If given enough time, will all moves be looked at? (at whatever the listed ply depth is) 

May2019   LoveThatJoker: <AylerKupp> Thanks for the engine analysis on 37. Qh8+. I too had gone with this continuation. LTJ 

May2019   LoveThatJoker: PS. In regards to yesterday's Gazza puzzle. 

May2019
  Sally Simpson: ***
Hi AylerKupp (finally spelling your name correct instead of AlyerKupp) You will be interested in this:
http://www.ifaamas.org/Proceedings/... it's paper on ICAT and chess computers.
*** 

May2119
  AylerKupp: <Sally Simpson> Thanks for the article on iCAT. At first glance I thought that the result should be obvious that chess players would enjoy a game against a physical opponent more than against a virtual opponent but even so it's good to have some confirmation. As you know I am a wine aficionado and I read that some enologists subjected Cabernet Sauvignon grapes to DNA testing to try to find out the grape's origin. After some analysis, at presumably a nontrivial expense, they determined that the Cabernet Sauvignon grape was the result of a cross between the Cabernet Franc grape and the Sauvignon Blanc grape. Then someone pointed out that, given the name Cabernet Sauvignon, shouldn't the answer have been obvious? Particularly since Cabernet Sauvignon, Cabernet Franc, and Sauvignon Blanc are all grown in the Bordeaux region of France. But I do wonder if the test was too simplistic. I don't know whether the effect of the novelty factor of playing against a physical opponent was properly considered. The article didn't say (I don't think) the number of games that each participant played against the iCat. Could it be that, particularly with the 8 to 12 year old participants, the players enjoyed the game more when faced with the novelty of playing against a physical iCat rather than a virtual one? Would the results have been the same if the same group of players had played a greater number of games against each type of opponent so that the novelty factor would have worn off? And I don't see why the participants were restricted to one kind of scenario (physical or virtual). It would seem to me that if they had a relatively large number of positions and if these positions were chosen at random, then there would have been minimum chances of duplication. Or, if by chance the same position was selected, a different position would have been substituted for it. And if the participants played both types of opponent (physical and virtual) and the order in which they played each opponent was randomly selected, then a more direct preference comparison would have been achieved. Oh well, just some thoughts. 

Sep1319
  AylerKupp: <My Experience with Dress Codes> On my last job before my retirement I was the manager of the software department for a company. The company did have a dress code but it was never enforced or even mentioned and, software developers being what they are, my subordinates pretty much dressed as they pleased, and I never talked to them about it because, frankly, I didn't know what the company's dress code was. I did mention to them, in case they didn't think of it on their own, that if they were giving a presentation in front of our customers, they should dress appropriately, business casual as a minimum. I personally wore casual attire except when giving presentations to our customers when I always wore a business suits. I never wore jeans except on weekends, and no shorts or sandals or flip flops. Then one day everyone in the company received an email that the company's dress code will henceforth be enforced. It prohibited everyone from wearing shorts, jeans (even nice ones), and sandals and for the women specifically it prohibited wearing Capri pants, spaghetti straps, and outfits with bare midriffs. Nobody knew why this email came out of the blue and one of my employees objected particularly strenuously. She said that she was a single mother, her wardrobe consisted only of jeans, and that she could not afford to get a completely new wardrobe. She finished by saying "I'm not giving up my jeans!" My response was along the lines of "I agree with you. I'm not giving up my jeans, my Capri pants, or my spaghetti straps but by popular request I have agreed to give up my bare midriff outfits." I of course talked to the powersthatbe to find out what triggered the email. I was told that there were some foreign potential customers visiting the company from a country where they could be offended by our employee's casual dress (and in the case of software developers <very> casual dress) and they didn't want that to happen. I asked why this wasn't indicated in the email since it did not seem like an unreasonable request for these customer's visits and any other customer's visits from countries that might be offended by our overly casual attire, and that if notified of such visits no one would object to more relatively "formal" attire when these customers were touring our facility, and then revert to their usual attire when there were no customers on site. I was told that this would be very hard to do. Which was absolute nonsense since <every> visitor had to indicate when they would be visiting our facility and submit their security clearances ahead of time. I always received a notice of such visits and asked my subordinates that they should straighten up their offices and the walk spaces somewhat (software developers tend to be rather messy) prior to the customers' arrival. Which told me that the powersthatbe were not that serious about the "problem" and that the issue would quickly die out. And, sure enough, it did. 

Sep1319
  AylerKupp: <My Experience with Company Rules and Regulations> Since I'm on a roll and this is my forum I can address any subject at all (within reason) so I will. This was the incident that taught me that all rules have a way around them and, rather than try to fight an unreasonable (at least to me) rule headon, it's easier to try to find the loopholes in the rules (there are always some) and go around them. Many years ago before the existence of personal computers we used minicomputers which could be shared by multiple users. At another company where I was the head of software development for my department we had a centralized minicomputer in the second floor which served the needs of the software developers. Three of my subordinates were sharing an office on the first floor and they had to go upstairs to use the computer. They asked me if I could arrange to have a dumb terminal (the only ones available at the time) installed in their office since that would save them the time to up and down the stairs, increase productivity, blah, blah, blah. That seemed like a reasonable request to me so I went to talk to the facilities manager. He told me that it was against company policy to do that because in those days they were required to install a hard conduit for the cables leading from the minicomputer to individual offices and, if my subordinates were to move, they would have wasted the money. I was about to object when I saw the smile on the face of the facilities manager who said "However, if you install two terminals instead of just one, that would make it a terminal room that's OK as far as company policy goes and I will approve the request." So my subordinates got two terminals in their office instead of one and were even happier. Perhaps FIDE could learn something from my experience. 

Oct2519
  AylerKupp: <Effect of "Fischer Rules" for TCEC engine tournaments (part 1 of 3)> First some clarification re: Fischer rules, please. 1. I assume that the 99 clause or equivalent only applies if one of the engines participating in the Semifinals is the engine champion from the previous season. 2. Since <alexmagnus> posted data for first to win 4 games and first to win 6 games I'm assuming that the equivalent to the Fischer rules would have a 33 clause for a first to win 4 games and a 55 clause for a first to win 6 games; e.g. in the case of first to win 4 games the Superfinal would be terminated if the match score reaches 33 with draws not counting and one of the engines participating in the Semifinals is the engine champion from the previous season. I'll call this the (MM) clause, where N = Number of games to win the match, draws not counting and M = N1. In these cases the defending engine champion keeps its title but there is no match winner. 3. My assumption is that, unlike the regular TCEC Superfinal matches when the match continued even if one of the engines was mathematically eliminated from winning or drawing, if the Fischer Rules were in effect the match will be terminated once one engine reaches the score required to win. And the match would also terminate if the (MM) clause applies. And since there was a "gap" between being first to 6 wins and being first to 10 wins I added a being first to 8 wins category so that N = (4, 6, 8, 10). Since in 4 of the 15 seasons neither engine reached 10 wins, I hoped that this would lead to more matches having a conclusive result in the first to 8 wins category. However, in 2 of the 15 first to 8 wins neither engine reached 8 wins, this was only a minor improvement in trying to ensure a conclusive result. Now for some general information about the TCEC matches to date: 1. So far there have been 15 seasons played, Seasons 1 and 2 and Seasons 4 through 16. As <alexmagnus> indicated Season 3 was not completed so there was no Superfinal. 2. In Seasons 1 and 2 the final match was referred to as the "Elite" match but I will refer to it as the "Superfinal" match for those two seasons for consistency. 48 games were played in each season. 3. In Seasons 4 and 5 there were 48 games played. 4. In Seasons 6 and 7 there were 64 games played. 5. In Seasons 8 and up there were 100 games played. 6. With one exception (Houdini 1.5a was used in Seasons 1 and 2) different engine versions were used in different seasons. But I will consider all Houdinis to be Houdini, all Stockfishs (Stockfishes?) to be Stockfish, etc. for the purpose in determining if the 33, 55, or 99 clauses applied to the defending engine champion. 

Oct2519
  AylerKupp: <Effect of "Fischer Rules" for TCEC engine tournaments (part 2 of 3)> Some information and statistics about the chess engines involved. Stockfish is the engine that has won the most Superfinals. This is the rundown; the second column is the number of times that the engine has won the superfinal and the second column is obviously the percentage that the engine won (out of 15 seasons): Stockfish 7 46.7%
Houdini 4 26.7%
Komodo 3 20.0%
LeelaC0 1 6.7%
AllieStein 0 0.0%
Rybka 0 0.0%
But if you look at their success rate; i.e. how many times did the engine <win> the Superfinal once they reached it, the answer is: Houdini 4 6 66.7%
Stockfish 7 12 58.3%
LeelaC0 1 2 50.0%
Komodo 3 7 42.9%
AllieStein 0 1 0.0%
Rybka 0 2 0.0%
The first 2 columns are as before, the third column is the number of times that the engine qualified for the Superfinal, and the fourth column is the number of times that the won the Superfinal, provided that they had qualified. So looking at it this way, it seems that Houdini was the most "efficient" in terms of winning the Superfinal once it had qualified to play in it. Rybka's performance might be a surprise to some given its dominance many years ago. But Rybka stopped being updated shortly after the TCEC was started with it's last update, Rybka 4.1, released in Mar2011. With chess engines, like most things in life, if you don't move with the times you will be left behind. And, of course, we are starting to see the increased playing strength of neural networkbased engines like LeelaC0 and AllieStein when supported by GPUs. It makes me wonder (no, it doesn't, but I won't go into that one here) how AlphaZero would have done if it had competed. I think that DeepMind missed an opportunity for additional good publicity by failing to enter the TCEC. And I also won't go into the computational performance advantage that these engines have by using highly parallel support processors like GPUs and TPUs. 

Oct2519
  AylerKupp: <Effect of "Fischer Rules" for TCEC engine tournaments (part 3 of 3)> Now for a summary of the TCEC results. If this is good enough for you, you can stop after this part. If you want to see the seasonbyseason impact of the Fischer Rules and some TCEC statistics (Draw %, White Scoring %, etc.), keep reading until you fall sleep from boredom. <Overall Summary> The number of times that an (MM) clause was applicable was in 10 or the 15 matches or 66.7% of the time. There were potentially N = (4, 6, 8, 10) x 15 seasons = 60 total possible winning conditions. The number of Superfinal matches when the eventual Superfinal winner won the match under the Fischer Clause = 50 / 60 (83.3%). Note: In 2 seasons neither engine won 8 games and in 4 seasons neither engine won 10 games so under those conditions no engine would have won those Superfinal matches if the Fischer Rules had been in effect. The number of Superfinal matches drawn under the Fischer Clause (i.e. the match score was (MM) so the match would have been terminated and therefore the defending champion would have retained its title) = 2 / 60 (3.3%) The number of Superfinal matches lost under the Fischer Clause (the winner under the Fischer clause was not the eventual winner of the match) = 4 / 60 (6.7%) <Win (N = 4, 6, 8, or 10) Category Summary> In the first to 4 wins category a definitive conclusion was reached in all 15 seasons. In 14 of these (93.3%) the first engine to reach 4 wins was the same as the Superfinal winner in the season. In one of the seasons the match score reached 33 so the match would have been terminated and the defending champion would have retained its title. However, in that match the eventual winner was the first engine to achieve a 43 score. In the first to 6 wins category a definitive conclusion was reached in all 15 seasons. In 13 of these (87.3%) the first engine to reach 6 wins was the same as the Superfinal winner in the season. In one of the seasons the match score reached 55 so the match would have been terminated and the defending champion would have retained its title. However, in that match the eventual winner was the first engine to achieve a 65 score. In the first to 8 wins category a definitive conclusion was reached in 13 of the 15 seasons (86.7%), in the other 2 seasons neither engine managed to win 8 games. Of the 13 matches where a definitive result was reached, in one of the matches the first engine to reach 8 games (7.7%) was not the engine that eventually won the match. In no matches did the score reach 77 so the Fischer clause for a tie match was never applied In the first to 10 wins category a definitive conclusion was reached in 11 of the 15 seasons (73.3%), in the other 4 seasons neither engine managed to win 8 games. Of the 11 matches where a definitive result was reached, all 11 engines were the eventual winner of the actual season. 

Oct2519
  AylerKupp: <SeasonbySeason Effect of "Fischer Rules" for TCEC engine tournaments (part 1 of 5)> Below is a seasonbyseason effect of the Fischer Rules when N = 4, 6, 8, and 10. <Season 1> Houdini vs. Rybka. Houdini won with a score of 23.5 – 16.5. (MM) clause not applicable since there is no defending champion. First to win 4 games, draws not counting: Houdini wins in 7 games with a score of 40. First to win 6 games, draws not counting: Houdini wins in 23 games with a score of 60. First to win 8 games, draws not counting: Houdini wins in 31 games with a score of 84. First to win 10 games, draws not counting: Houdini wins in 37 games with a score of 104. After 40 games Houdini was ahead with a score of 125. <Season 2> Houdini vs. Rybka. Houdini won with a score of 22 – 18. (MM) clause applicable since Houdini was the defending champion. First to win 4 games, draws not counting: <Rybka> wins in 9 games with a score of 41. First to win 6 games, draws not counting: After 31 games the score reaches 55 so the match would have ended and Houdini would retain the title, Houdini won the 32nd game so it would have won the match with a score of 65 if the (MM) clause had not been in effect. First to win 8 games, draws not counting: Houdini wins in 37 games with a score of 85. Neither engine scores 10 points. After 40 games Houdini was ahead 95 so presumably with a 4 game lead and only one more game to win it would have reached 8 or 10 wins first. And even if Rybka could have pulled a 1984 Kasparov, Houdini would have kept its title once the score reached 99. <Season 4> (remember, Season 3 was not completed) Houdini vs. Stockfish. Houdini won with a score of 25 – 23. (MM) clause applicable since Houdini was the defending champion. First to win 4 games, draws not counting: Houdini wins in 31 games with a score of 42. First to win 6 games, draws not counting: Houdini wins in 40 games with a score of 64. First to win 8 or 10 games, draws not counting: No result since neither engine was able to win 8 or 10 games but with only a 2 game lead and 4 more wins needed by Houdini it's not clear to me which engine would have scored 8 or 10 points first. 

Oct2519
  AylerKupp: <SeasonbySeason Effect of "Fischer Rules" for TCEC engine tournaments (part 2 of 5)> <Season 5> Komodo vs. Stockfish. Komodo won with a score of 25 – 23. (MM) clause is not applicable since the defending champion, Houdini, was not participating in the Semifinal. First to win 4 games, draws not counting: <Stockfish> wins in 11 games with a score of 43. First to win 6 games, draws not counting: Komodo wins in 25 games with a score of 64. First to win 8 games, draws not counting: Komodo wins in 27 games with a score of 84. First to win 10 games, draws not counting: Komodo wins in 48 games with a score of 108. Last game of the match. <Season 6> Komodo vs. Stockfish. Stockfish won with a score of 35.5 – 28.5. (MM) clause applicable since Komodo was the defending champion. First to win 4 games, draws not counting: Stockfish wins in 18 games with a score of 40. First to win 6 games, draws not counting: Stockfish wins in 31 games with a score of 62. First to win 8 games, draws not counting: Stockfish wins in 37 games with a score of 83. First to win 10 games, draws not counting: Stockfish wins in 44 games with a score of 103. After 64 games Stockfish was ahead 136.
<Season 7> Stockfish vs. Komodo. Komodo won with a score of 33.5 – 30.5. (MM) clause applicable since Stockfish was the defending champion. First to win 4 games, draws not counting: Komodo wins in 26 games with a score of 42. First to win 6 games, draws not counting: Komodo wins in 61 games with a score of 64. First to win 8 or 10 games, draws not counting: No result since neither engine was able to win either 8 or 10 games. After 64 games Komodo was ahead 74 and so with a 3 game lead presumably it would have reached 8 or 10 wins first, but the fact that it needed to win 3 more games puts the projected final result somewhat in doubt. 

Oct2519
  AylerKupp: <SeasonbySeason Effect of "Fischer Rules" for TCEC engine tournaments (part 3 of 5)> <Season 8> Komodo vs. Stockfish. Komodo won with a score of 53.5 – 46.5. (MM) clause applicable since Komodo was the defending champion. First to win 4 games, draws not counting: Komodo wins in 46 games with a score of 41. First to win 6 games, draws not counting: Komodo wins in 78 games with a score of 61 First to win 8 games, draws not counting: Komodo wins in 86 games with a score of 81 First to win 10 games, draws not counting: No result since neither engine was able to win 10 games. After 100 games Komodo was ahead 92 and so with a 7 game lead and only needing to win 1 more game presumably it would have reached 10 wins first. <Season 9> Stockfish vs. Houdini. Stockfish won with a score of 54.5 – 45.5. (MM) clause not applicable since the defending champion, Komodo, was not in the Superfinal First to win 4 games, draws not counting: Stockfish wins in just 15 games with a score of 40. First to win 6 games, draws not counting: Stockfish wins in just 21 games with a score of 61 First to win 8 games, draws not counting: Stockfish wins in 43 games with a score of 83 First to win 10 games, draws not counting: Stockfish wins in 55 games with a score of 103. After 100 games Stockfish was ahead 178.
<Season 10> Houdini vs. Komodo. Houdini won with a score of 53 – 47. (MM) clause not applicable since the defending champion, Stockfish, was not in the Superfinal First to win 4 games, draws not counting: Houdini wins in just 14 games with a score of 40. First to win 6 games, draws not counting: Houdini wins in 40 games with a score of 61 First to win 8 games, draws not counting: Houdini wins in 58 games with a score of 84 First to win 10 games, draws not counting: Houdini wins in 62 games with a score of 104. After 100 games Houdini was ahead 159. 

Oct2519
  AylerKupp: <SeasonbySeason Effect of "Fischer Rules" for TCEC engine tournaments (part 4 of 5)> <Season 11> Stockfish vs. Houdini. Stockfish won with a score of 59 – 41. (MM) clause applicable since Houdini was the defending champion. First to win 4 games, draws not counting: Stockfish wins in just 23 games with a score of 40. First to win 6 games, draws not counting: Stockfish wins in just 26 games with a score of 60 First to win 8 games, draws not counting: Stockfish wins in just 29 games with a score of 80 First to win 10 games, draws not counting: Stockfish wins in just 31 games with a score of 100. After 100 games Stockfish was ahead 202. A dominating performance by Stockfish. <Season 12> Stockfish vs. Komodo. Stockfish won with a score of 60 – 40. (MM) clause applicable since Stockfish was the defending champion. First to win 4 games, draws not counting: Stockfish wins in just 17 games with a score of 41. First to win 6 games, draws not counting: Stockfish wins in just 28 games with a score of 61 First to win 8 games, draws not counting: Stockfish wins in just 35 games with a score of 83 First to win 10 games, draws not counting: Stockfish wins in just 46 games with a score of 105. After 100 games Stockfish was ahead 299. Another dominating performance by Stockfish. <Season 13> Stockfish vs. Komodo. Stockfish won with a score of 55 – 45. (MM) clause applicable since Stockfish was the defending champion. First to win 4 games, draws not counting: Stockfish wins in 29 games with a score of 40. First to win 6 games, draws not counting: Stockfish wins in 53 games with a score of 62 First to win 8 games, draws not counting: Stockfish wins in 59 games with a score of 82 First to win 10 games, draws not counting: Stockfish wins in 77 games with a score of 104. After 100 games Stockfish was ahead 166. Not as dominating as the previous two performances by Stockfish but still impressive. 

Oct2519
  AylerKupp: <SeasonbySeason Effect of "Fischer Rules" for TCEC engine tournaments (part 5 of 5)> <Season 14> Stockfish vs. Leela C0. Stockfish won with a score of 50.5 – 49.5, a real squeaker. (MM) clause applicable since Stockfish was the defending champion. First to win 4 games, draws not counting: The match score reached 33 after 17 games so the match would have been stopped and Stockfish would have retained its title due to the (MM) (33) clause. Stockfish reaches a 43 score after 20 games so it would have won the title without the (MM) clause. First to win 6 games, draws not counting: After two more straight wins Stockfish wins in only 22 games with a score of 63. First to win 8 games, draws not counting: <LeelaC0> wins in 53 games with a score of 86 First to win 10 games, draws not counting: The score reaches 99 after 80 games so the match would have been stopped and Stockfish would have retained the title due to the (MM) (99) clause. Stockfish reaches a 1003 score after 85 games so it would have won the title without the (MM) clause. After 100 games Stockfish was still ahead 109. So a squeaker of a match regardless of whether draws were counted or not counted. <Season 15> Stockfish vs. LeelaC0. LeelaC0 won with a score of 53.5 – 46.5. (MM) clause applicable since Stockfish was the defending champion. First to win 4 games, draws not counting: LeelaC0 wins in 24 games with a score of 41. First to win 6 games, draws not counting: LeelaC0 wins in 36 games with a score of 62 First to win 8 games, draws not counting: LeelaC0 wins in 40 games with a score of 83 First to win 10 games, draws not counting: LeelaC0 wins in 62 games with a score of 105. After 100 games LeelaC0 was ahead 147. An impressive performance by LeelaC0 when supported by a GPU against the engine with the most appearances and most wins in the Superfinal. <Season 16> Stockfish vs. AllieStein. Stockfish won with a score of 54.5 – 45.5. (MM) clause not applicable since LeelaC0 was the defending champion. First to win 4 games, draws not counting: Stockfish wins in 26 games with a score of 40. First to win 6 games, draws not counting: Stockfish wins in 42 games with a score of 64 First to win 8 games, draws not counting: Stockfish wins in 66 games with a score of 84 First to win 10 games, draws not counting: Stockfish wins in 12 games with a score of 104. After 100 games Stockfish was ahead 145. Still, an impressive performance by AllieStein when supported by a GPU in its first participation in the TCEC against the engine with the most appearances and most wins in the Superfinal. 

Oct2519
  AylerKupp: <Other TCEC engine tournaments statistics> Some perhaps interesting statistics of the 16 seasons. Total number of games played = 1,204.
<Wins, Draws, and Losses> Number of White wins = 201 (16.7%)
Number of Black wins = 89 (7.4%)
Number of Draws = 914 (75.9%)
<Scoring Percentages> White Scoring % = 56.9%
Black Scoring % = 43.1%
It's interesting to see that these scoring percentages are not too far off the scoring percentages for players rated 2700+ in the ChessTempo database, 54.8% for White and 45.2% for Black <Trends>
The average White winning % has stayed roughly constant in the 15 seasons. But the Draws % as been increasing and so the Black winning % has been decreasing. So the White scoring % has been increasing somewhat and the Black scoring % has correspondingly been decreasing by the same amount. So it seems that the engines in the Superfinal have been getting relatively harder to beat as Black. If you've gotten this far and you're still interested, you can download a spreadsheet with the seasonbyseason results, statistical calculations, and trend charts from http://www.mediafire.com/file/8zoht.... You will need Excel 2003 or later or a spreadsheet or viewer capable of reading Excel 2003 files. 



< Earlier Kibitzing · PAGE 57 OF 57 ·
Later Kibitzing> 


