chessgames.com
Members · Prefs · Laboratory · Collections · Openings · Endgames · Sacrifices · History · Search Kibitzing · Kibitzer's Café · Chessforums · Tournament Index · Players · Kibitzing

 
Chessgames.com User Profile Chessforum

AylerKupp
Member since Dec-31-08 · Last seen Nov-20-19
About Me (in case you care):

Old timer from Fischer, Reshevsky, Spassky, Petrosian, etc. era. Active while in high school and early college, but not much since. Never rated above low 1800s and highly erratic; I would occasionally beat much higher rated players and equally often lose to much lower rated players. Highly entertaining combinatorial style, everybody liked to play me since they were never sure what I was going to do (neither did I!). When facing a stronger player many try to even their chances by steering towards simple positions to be able to see what was going on. My philosophy in those situations was to try to even the chances by complicating the game to the extent that neither I nor the stronger player would be able to see what was going on! Alas, this approach no longer works in the computer age. And, needless to say, my favorite all-time player is Tal.

I also have a computer background and have been following with interest the development in computer chess since the days when computers couldn't always recognize illegal moves and a patzer like me could beat them with ease. Now it’s me that can’t always recognize illegal moves and any chess program can beat me with ease.

But after about 8 years (a lifetime in computer-related activities) of playing computer-assisted chess, I think I have learned a thing or two about the subject. I have conceitedly defined "AylerKupp's corollary to Murphy's Law" (AKC2ML) as follows:

"If you use your engine to analyze a position to a search depth=N, your opponent's killer move (the move that will refute your entire analysis) will be found at search depth=N+1, regardless of the value you choose for N."

I’m also a food and wine enthusiast. Some of my favorites are German wines (along with French, Italian, US, New Zealand, Australia, Argentina, Spain, ... well, you probably get the idea). One of my early favorites were wines from the Ayler Kupp vineyard in the Saar region, hence my user name. Here is a link to a picture of the village of Ayl with a portion of the Kupp vineyard on the left: http://en.wikipedia.org/wiki/File:A...

You can send me an e-mail whenever you'd like to aylerkupp gmail.com.

And check out a picture of me with my "partner", Rybka (Aylerkupp / Rybka) from the CG.com Masters - Machines Invitational (2011). No, I won't tell you which one is me.

-------------------

Ratings Inflation

I have become interested in the increase in top player ratings since the mid-1980s and whether this represents a true increase in player strength (and if so, why) or if it is simply a consequence of a larger chess population from which ratings are derived. So I've opened up my forum for discussions on this subject.

I have updated the list that I initially completed in Mar-2013 with the FIDE rating list through 2018 (published in Jan-2019), and you can download the complete data from https://www.mediafire.com/file/g89w.... It is quite large (~ 213 MB) and to open it you will need Excel 2007 or later version or a compatible spreadsheet since several of the later tabs contain more than 65,536 rows.

The spreadsheet also contains several charts and summary information. If you are only interested in that and not the actual rating lists, you can download a much smaller (~ 1 MB) spreadsheet containing the charts and summary information from https://www.mediafire.com/file/m5nk.... You can open this file with a pre-Excel 2007 version or a compatible spreadsheet.

FWIW, after looking at the data I think that ratings inflation, which I define to be the unwarranted increase in ratings not necessarily accompanied by a corresponding increase in playing strength, is real, but it is a slow process. I refer to this as my "Bottom Feeder" hypothesis and it goes something like this:

1. Initially (late 1960s and 1970s) the ratings for the strongest players were fairly constant.

2. In the 1980s the number of rated players began to increase exponentially, and they entered the FIDE-rated chess playing population mostly at the lower rating levels. Also, starting in 1992, FIDE began to periodically lower the rating floor (the lowest rating for which players would be rated by FIDE) from 2200 to the current 1000 in 2012. This resulted in an even greater increase in the number of rated players. And the ratings of those newly-rated players may have been higher than they should have been, given that they were calculated using a high K-factor.

3. The ratings of the stronger of these players increased as a result of playing these weaker players, but their ratings were not sufficiently high to play in tournaments, other than open tournaments, where they would meet middle and high rated players.

4. Eventually they did. The ratings of the middle rated players then increased as a result of beating the lower rated players, and the ratings of the lower rated players then leveled out and even started to decline. You can see this effect in the 'Inflation Charts' tab, "Rating Inflation: Nth Player" chart, for the 1500th to 5000th rated player.

5. Once the middle rated players increased their ratings sufficiently, they began to meet the strongest players. And the cycle repeated itself. The ratings of the middle players began to level out and might now be ready to start a decrease. You can see this effect in the same chart for the 100th to 1000th rated player.

6. The ratings of the strongest players, long stable, began to increase as a result of beating the middle rated players. And, because they are at the top of the food chain, their ratings, at leas initially, continued to climb. I think that they will eventually level out and may have already done that except for possibly the very highest rated players (rated among the top 50) but if this hypothesis is true there is no force to drive them down so they will now stay relatively constant like the pre-1986 10th rated player and the pre-1981 50th rated player. When this leveling out will take place, if it does, and at what level, I have no idea. But a look at the 2017 ratings data indicates that, indeed, it has already started, maybe even among the top 10 rated players.

You can see in the chart that the rating increase, leveling off, and decline first starts with the lowest ranking players, then through the middle ranking players, and finally affects the top ranked players. As of today the average ratings of ALL the players, including the average of the Top-10 rated players, has been fairly constant since 2014.

It's not precise, it's not 100% consistent, but it certainly seems evident. And the process takes decades so it's not easy to see unless you look at all the years and many ranked levels.

Of course, this is just a hypothesis and the chart may look very different 20 years from now. But, at least on the surface, it doesn't sound unreasonable to me.

But looking at the data through 2018 it is even more evident that the era of ratings inflation appears to be over, unless FIDE once more lowers the rating floor and a flood of new and unrated players enters the rating pool. The previous year's trends have either continued or accelerated; the rating for every ranking category has either flattened out or has started to decline as evidenced by the trendlines.

-------------------

Chess Engine Non-Determinism

I've discussed chess engine non-determinism many times. If you run an analysis of a position multiple times, with the same engine, the same computer, and to the same search depth, you will get different results. Not MAY, WILL. Guaranteed. Similar results were reported by others.

I had a chance to run a slightly more rigorous test and described the results starting here: US Championship (2017) (kibitz #633). I had 3 different engines (Houdini 4, Komodo 10, and Stockfish 8 analyze the position in W So vs Onischuk, 2017 after 13...Bxd4, a highly complex tactical position. I made 12 runs with each engine; 3 each with threads=1, 2, 3, and 4 on my 32-bit 4-core computer with 4 MB RAM and MPV=3. The results were consistent with each engine:

(a) With threads=1 (using a single core) the results of all 3 engines were deterministic. Each of the 3 engines on each of the analyses selected the same top 3 moves for each engine, with the same evaluations, and obviously the same move rankings.

(b) With threads =2, 3, and 4 (using 2, 3, and 4 cores) none of the engines showed deterministic behavior. Each of the 3 engines on each of the analyses occasionally selected different analyses for the same engine, with different evaluations, and different move rankings.

I've read that the technical reason for the non-deterministic behavior is the high sensitivity of the alpha-beta algorithms that all the top engines use to move ordering in their search tree, and the variation of this move ordering using multi-threaded operation when each of the threads gets interrupted by higher-priority system processes. I have not had the chance to verify this, but there is no disputing the results.

What's the big deal? Well if the same engine gives different results each time it runs, how can you determine what's the real "best" move? Never mind that different engines or relatively equal strength (as determined by their ratings) give different evaluations and move rankings for their top 3 move and that the evaluations may differ as a function of the search depth.

Since I believe in the need to run analyses of a given position using more than one engine and then aggregating the results to try to reach a more accurate assessment of a position, I typically have run sequential analyses of the same position using 4 threads and a hash table = 1,024 MB. But since I typically run 3 engines, I found it to be more efficient to run analyses using all 3 engines concurrently, each with a single thread and a hash table = 256 MB (to prevent swapping to disk). Yes, running with a single thread runs at 1/2 the speed of running with 4 threads but then running the 3 engines sequentially requires 3X the time and running the 3 engines concurrently requires only 2X the time for a 50% reduction in the time to run all 3 analyses to the same depth, and resolving the non-determinism issues.

So, if you typically run analyses of the same position with 3 engines, consider running them concurrently with threads=1 rather than sequentially with threads=4. You'll get deterministic results in less total time.

-------------------

A Note on Chess Engine Evaluations

All engines provide different evaluations of the next "best" move, sometimes significantly different. For example, Stockfish's evaluations tend to be higher than other top engines and Houdini's evaluations tend to be lower. This could be because Stockfish typically reaches greater search depths than the other top engines in the same amount of time, and Houdini's typically reaches lower search than the other top engines. Or it could be for other reasons.

If we are analyzing a position we typically want to use the "best" engine as "measured" by its rating,, and that's currently (Mar-2018) Stockfish 10 for "classic" chess engines (I'm deliberately excluding AlphaZero and Leela Chess Zero because they use a different move/search tree branch evaluation approach and the best versions of them use either TPU or GPU support to enhance their calculation capability and therefore are not directly comparable), and it's higher rating has been achieved in engine vs. engine tournaments such as CCRL and CEGT. But the "best" engine as determined by playing head-to-head games is not necessarily the best engine for <analysis> since in analysis we not only want to know the best moves from a given position but we want an accurate <evaluation> of the position. Specifically, we want an accurate evaluation of the position in <absolute> terms in order to determine whether one side has a likely winning advantage (generally an absolute evaluation > [ ±2.00] or 2 pawns), a significant advantage (generally an absolute evaluation in the range [ ±1.00] to [ ±1.99], a slight advantage (generally an absolute evaluation in the range [ ±0.50] to [ ±0.99], of if the position is approximately equal (generally an absolute evaluation in the range [-0.49 to +0.49]).

But when playing a game an accurate <absolute> evaluation is irrelevant, what counts is an accurate <relative> evaluation. This is because all chess engines using the minimax algorithm to determine the best move (assuming best play by both sides) do that by a series of pairwise comparisons between two moves. So if an engine is trying to determine which of 2 moves, A and B is better, it doesn't matter if their evaluations are [+12.00] or [+11.00], [+1.20] or [+1.10], or [+0.12] or [+0.11], it will always select move A as the better move and consider that branch in the search tree to be the better line. So multiplying 2 evaluations by a fixed constant or adding a fixed constant to 2 evaluations has no effect in the engine determining which of the 2 moves is better. But clearly, evaluations of [+12.00], [+1.20], or [+0.12] will give the analyst much different impressions of the position.

In practice the discrepancies in evaluations between several engines is not that drastic, but I suggest that you don't assume that Stockfish's <absolute> evaluations are the most accurate just because it is (currently) the best "classical" game-playing engine (i.e. not using GPU or TPU support) or because it reached the greater search depth in a given amount of time.

-------------------

TCEC Observations

In response to a question by <john barleycorn> I looked into the 16 TCEC Superfinal matches to date, summarized the results, provided season-by-season summaries, and compiled some statistics. The objective was to see if the "Fischer Rules" as proposed for Karpov - Fischer World Championship Match (1975) (winner first to win 10 games with draws not counting, match would be terminated if score reached 9-9 with no match winner and the champion retaining his title). User <alexmagnus> provided some statistics for the situations of the first to win 4 games and first to win 10 games, and I added some statistics for the situation of the first to win 8 games.

You can see the information starting at AylerKupp chessforum (kibitz #1537) below. You can download a spreadsheet with the season-by-season results, statistical calculations, and trend charts from http://www.mediafire.com/file/8zoht.... You will need Excel 2003 or later or a spreadsheet or viewer capable of reading Excel 2003 files.

-------------------

Any comments, suggestions, criticisms, etc. are both welcomed and encouraged.

-------------------

Chessgames.com Full Member

   AylerKupp has kibitzed 12635 times to chessgames   [more...]
   Nov-20-19 Grand Prix Hamburg (2019) (replies)
 
AylerKupp: <<beatgiant> Here comes the <endless discussion on the appropriateness of this criterion>> I hope not. The discussion to date, although maybe not endless, has said pretty much everything that needs to be said, not that this makes much of a difference. At least I ...
 
   Nov-19-19 Karpov vs Kasparov, 1990
 
AylerKupp: <WorstPlayerEver> I had never heard of Salov, being inactive in chess during the period that he rose to prominence. But after reading his short biography in Wikipedia I think that you are right, he and Fischer are kindred spirits.
 
   Nov-17-19 Geller vs Karpov, 1976 (replies)
 
AylerKupp: <Sally Simpson> But do you have a limit on how many used chess books you're allowed in one year?
 
   Nov-17-19 J K Duda vs Grischuk, 2019
 
AylerKupp: <beatgiant> Ooops!, I forgot to provide the story I promised about "obvious". This was told to me by the college professor in one of my physics classes during a lecture covering, I think, Schrödinger's wave equation: A professor was giving a lecture and during the lecture ...
 
   Nov-12-19 Karpov - Fischer World Championship Match (1975)
 
AylerKupp: <saffuna> Yes, good actors are like top-level GMs, they both make it look easy when it's not.
 
   Nov-10-19 Hikaru Nakamura
 
AylerKupp: <Some engine evaluations for AlphaZero vs. Stockfish, 2018> (part 7 of 7) And here is a summary of how the 3 engines ranked their top 5 moves, without regard for the numerical value of the evaluation. Black's Houdini 6 Komodo 12.3 MCTS ShashChess 8.0 Move d=30 d=30 d=46 ...
 
   Nov-03-19 Stockfish vs AlphaZero, 2018 (replies)
 
AylerKupp: <<keypusher> If chessplayers rated 3500 exist, then chessplayers rated 3000 must miss quite a lot compared to them.> Chess ratings are relative to the pool of players that play against each other. Chess engines only play other chess engines, they don't play against ...
 
   Oct-31-19 AlphaZero (Computer) (replies)
 
AylerKupp: <<keypusher> I don't even know what game this is. But humans are better at it than DeepMind AI, at least for now.> Me neither, but the game is certainly not chess, and I doubt (at least I hope so!) that it is not as popular. So if DeepMind's AlphaStar is not (yet?) ...
 
   Oct-31-19 Isle of Man Grand Swiss (2019) (replies)
 
AylerKupp: <<Olavi> Well, no, impossible is impossible. That 41 points includes his gaims in the World Cup.> Good catch, I missed that. It also includes the results from the other 27 players that played in both the 2019 FIDE World Cup and the 2019 FIDE Grand Swiss. Among the top ...
 
   Oct-30-19 AlphaZero vs Stockfish, 2018 (replies)
 
AylerKupp: <AlphaZero (Computer) vs. Stockfish (Computer) (part 2 of 2)> No, restarting the analysis from the position above I clearly missed 44...Qa7+ with the possibility of 45.Kh2 Nd6 46.Bxd6 cxd6 47.Qxd6 Qxa4 preventing the loss of a second pawn. Indeed, restarting the analysis ...
 
(replies) indicates a reply to the comment.

De Gustibus Non Disputandum Est

Kibitzer's Corner
< Earlier Kibitzing  · PAGE 57 OF 57 ·  Later Kibitzing>
Apr-06-19
Premium Chessgames Member
  AylerKupp: <Personal stuff – continued>

<There's a psychological approach which I very much agree with, it's called - I'm OK, you are OK. >

I remember that book. But sometimes someone <is> right and someone <is> wrong, and it's important to determine which one is what. Hopefully it can be done constructively and with civility, but it must be done regardless. But there should not be any reason to get personal about it.

The 4 interactions you mentioned reminded me of a chart that I used to present at kick-off meetings for new projects. It was titled "If you do what I say" and it went something like this:

1. If you do what I say and it turns out well, we'll both be praised for our good judgment.

2. If you do what I say and it doesn't turn out well, I'll be blamed for my poor judgment.

3. If you don't do what I say and it turns out well, you'll be praised for your good judgment.

4. If you don't do what I say and it doesn't turn out well, you're screwed.

<I've never considered ignoring you as I've learned from your posts. I think you add a lot to many discussions and topics. I apologize if I gave the wrong impression, I'd like to restate that we have agreed on most of points we've discussed and that I respect your point of view on the things we don't. In any case, thanks for the debate.>

Again, suggesting that people put me on their ignore list is another way of trying to terminate what I consider a less than productive discussion and move on to other things. Besides, since the number of people you can put in your ignore list is limited, I look forward to the quandary those people will eventually face if they eventually try to add someone else to their ignore list and find out that in order to do so they must delete someone from it. Decisions, decisions. :-)

And no, I don't think that you gave me the wrong impression. The main point that I was trying to make was that there are times when you want to know the best implementation (hardware + software) of a solution to a problem and there are times when all you want to know is the best software implementation. That's all. And after all, our motivations and interest for preferring one over the other may change in the near future; "Indecision is the key to flexibility". (same story as the dead cat's story).

Apr-08-19  MrMelad: <AylerKupp> I appreciate the time you invest in your comments and also your positive thinking and optimism. I value your many contributions in this site on many topics.

Regarding our discussion in the AlphaZero page I think it has reached a point where both of us made our best case and we should either wait for others to chip in or just let it be.

Apr-15-19
Premium Chessgames Member
  AylerKupp: <<BOSTER> What is the difference between usual tree Search for engine (Kotov) and Monte-Carlo Tree Search, based for the AZ?>

First let's make sure we're talking about the same thing. Here is what I think is a summary of the differences between search tree expansion and pruning in both classic engines (per Shannon's original 1949 paper "Programming a Computer for Playing Chess" (https://vision.unipv.it/IA1/aa2009-... ) as used by Stockfish, Komodo, etc. and an MCTS-like search tree expansion and pruning as used by AlphaZero ("Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm", ((https://arxiv.org/pdf/1712.01815.pdf (not much on MCTS), "A general reinforcement learning algorithm that masters chess, shogi and Go through self-play" https://deepmind.com/documents/260/... (better), and the "Game Changer" book (the easiest to follow, but ...) by Mathew Sadler and Natasha Regan) and "Technical Explanation of Leela Chess Zero" by Andy Olsen, https://github.com/LeelaChessZero/l... ). Both AlphaZero articles were written by members of the DeepMind team and they assisted the authors of "Game Changer". Be aware that there are contradictions on how MCTS is implemented in AlphaZero (if at all!) in all 3 papers and the book, so I'm not really sure how AlphaZero implements MCTS or its apparently close derivative, PUCT.

I'm aiming to highlight the similarities between the two approaches (of which, at a high level, there are many) and the differences, although of course there are many details I've left out, mainly because, sadly, I don't know them. :-(

Part 1 of 3: <Classic>

1. Start with the current game position (the search tree root) and identify all legal candidate next moves.

2. Select the "most promising" candidate next moves based on heuristics (which differ from engine to engine). The heuristics represent "educated guesses" as to which moves will be the roots of the branches of the search tree that are most likely to contain the best moves.

3. Evaluate the resulting position for the most promising next moves by means of a hand crafted evaluation function which returns an evaluation function based on centipawns.

4. Expand the tree by repeating the process for all the most promising candidate moves at the next level, an using the alpha-beta pruning algorithm to delete from the search tree when at least one move is found to be worse than a previously examined move; such moves need not be evaluated further. I call alpha-beta pruning an algorithm because when applied to a standard minimax tree, it returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision.

5. Ignoring quiescent search and extensions (which continue expansion of the subtree for a particular move if a response is either literally forced (such as a response to a check) or practically forced (if a recapture is needed to avoid overwhelming material disadvantage), continue to expand the tree until all the most promising moves have been evaluated.

6. Propagate the results upwards to the original root of the tree by selecting the move that, at each level, results in the alternating best (maximum) and worst (minimum) evaluation along each branch of the root tree (hence the name minimax). The intent is to select the branch of the search tree (the Principal Variation) that contains the best moves for each player at each level of the root tree. This assumes that each player will, in turn, select the move that results in the most advantageous position for them.

7. Repeat the process by increasing the depth of the search tree by one ply for the best moves (typically a small number, in the order of 5 or so) selected at each ply until the time management function of the chess engine indicates that it's time to make a move.

Apr-15-19
Premium Chessgames Member
  AylerKupp: Part 2 of 3 <MCTS (and it's close relative, PUCT)>:

1. Start with the current game position and identify all legal candidate next moves.

2. Select the "most promising" candidate next moves based on the training of its neural network.

3. "Evaluate" the resulting position by conducting a series of game simulations (playouts) until a result (win, draw, loss) and determine the scoring percentage (no. of wins + number of losses/2) of that move as well as other statistics such as number of wins, probability that the move being examined will be selected (based on the training of its neural network).

At least that's the traditional definition of MCTS. The "A general reinforcement training ...", the "Mastering Shogi by Self Play ..." articles indicate that they obtain the game result probabilities by doing playouts, the "Technical Explanation of Leela Chess Zero" article indicates that AlphaZero (as well as Leela Chess Zero) uses the PUCT algorithm, and "Game Changer" explains the process reasonably well but never actually says that it uses MCTS and is not clear how the game result probabilities are obtained, other than "the percentage represents the prior move probability" without saying how that's calculated. Then again the focus of the book is to provide examples of AlphaZero's play and not to fully describe how AlphaZero works. But all articles and the book basically refer to an MCTS-time tree generation and branch evaluation and selection.

To make matters worse, MCTS-Minimax hybrids have been developed (see "MCTS-Minimax Hybrids with State Evaluations" and "MCTS-Minimax Hybrids with State Evaluations (Extended Abstract"; you must Google them and download them directly) which, of course, claims that combining the two methods yields the best results, at least in some domains. And, also of course, there seems to be some controversy on the suitability of "pure" MCTS for chess applications (see http://talkchess.com/forum3/viewtop...).

4. Expand the tree by repeating the process until a position is found that has not been found and evaluated (examined) before, or a position that ends the game.

6. Propagate the results upwards to the original root of the tree and recalculating the scoring percentage and the other statistics for each branch of the tree.

7. Repeat the process by increasing the depth of the subtree of the move being examined by one ply for those moves in the search tree (again, probably a small number, but I don't know the range of that number) that have the highest expected scoring percentage. Again, the engine's time management function of the chess engine determines when it's time to make a move.

Apr-15-19
Premium Chessgames Member
  AylerKupp: Part 3 of 3 <Pros and Cons of each>:

Whenever you have 2 approaches you typically have pros and cons for each. In anticipation of your possible next question, here are some pros and cons to either the classic or MCTS-based approaches to tree searching. Most of these pros/cons are from published literature, some are my opinions and observations. The latter may not interest you. :-)

1. Some claim, as stated above, that MCTS is not that well suited for chess applications, contrary to the views of the DeepMind team and Matthew Sadler.

2. MCTS is supposed to provide a "more human" approach as far as best move selection because it immediately narrows down the best move candidates to consider and reduces these to a small number, just like a GM might do. Mathew Sadler says that "An engine such as Stockfish works on the basis that the evaluation of the best move determines the evaluation of the position" which is horrendously incorrect. Classis chess engines work on the basis that the evaluation of the best <branch> (in the minimax sense) determines the best of the move candidates considered. How to determine the best of the move candidates in a classic chess engine is determined by the search heuristics.

3. Minimax will provide the best answer in terms of what moves to play in a zero-sum game, MCTS will only approximate the best answer. But that apparently is good enough, at least under the conditions under which engine vs. engine chess games are played.

4. (my opinion) A classic chess engine using minimax will always be faced with the horizon effect no mater how deep it searches (see a tongue-in-cheek description of "AylerKupp's corollary to Murphy's Law (AKC2ML) in my header above). Based on the example in "Game Changer" MCTS <seems> to be able to reach deeper search depths in the same amount of time than a classic engine because it is apparently more efficient in identifying the best candidate moves and so it can better control the width of its search tree, narrowing the number of branches it needs to consider in order to dedicate its resources analyzing the most promising moves to get good results. It's unfortunate that I haven't been able to find any sources that list the calculation time and search depths achieved by AlphaZero for each move.

But, like Stockfish's aggressive search tree pruning, this is double edged. Yes, the engine can search deeper but it is also more likely to miss the best moves by both sides. Sadler says that "I think this also explains how AlphaZero might occasionally miss an unusual, 'unfair' tactic in a position. Since AlphaZero is pruning possibilities to consider so early and rigorously, it might discount a non-standard move before it could examine it at the depth required to see its hidden strengths.

5.(my opinion) But perhaps the best indication of the relative merits of minimax with heuristics and alpha-beta pruning vs. MCTS is provided by Komodo 12.x. In addition to its standard classic version, it provides an option to use MCTS instead of minimax, etc. and both engine versions have competed in the TCEC, CCRL, and CEGT engine vs. engine tournaments. In all cases Komodo 12.x standard performed better than the corresponding Komodo 12.x MCTS and achieved higher ratings. But, in fairness, the Komodo 12.x MCTS option is relatively new, likely does not perform as efficiently as the standard version (which has been available for some time in Komodo multicore) and even crashed 3 times in the most recent TCEC tournament. So we might not yet be comparing apples to apples in terms of performance, and the CCRL ratings for Komodo 12.x MCTS version have been increasing faster than the Komodo 12.x standard version. But, when all engine components are basically the same, Komodo 12.x with minimax consistently outperforms Komodo 12.x with MCTS.

Hopefully the above answers at least some of your questions. As usual, there is a lot of information out there in case you want to dig deeper.

Apr-15-19
Premium Chessgames Member
  AylerKupp: <<MrMelad> Regarding our discussion in the AlphaZero page I think it has reached a point where both of us made our best case and we should either wait for others to chip in or just let it be.>

I agree, but there are still some things I want to say to clear up some misconceptions based on your last series of posts and offer yet another example of what I've been trying to say.

<<MrMelad> I'm trying to offset some of your claims as they seem to focus around diminishing and discrediting the accomplishments of AlphaZero and Leela.>

I am in <NO WAY> trying to discredit the accomplishments of AlphaZero and Leela Chess Zero. I think (and have said) that AlphaZero deserves a huge amount of credit for their ability to generalize the implementation of neural network-based game playing engines by reducing the amount of domain specific (i.e. game rules) needed to implement the engine. And Leela Chess Zero showed how neural network based training could be implemented by distributing the task across a network and tapping on the resources of individuals connected to that network. Similar, in a way, how the available computer resources of many individuals were tapped to support SETI.

But I am trying to put things in perspective and correct some of the claims made by the more enthusiastic and apparently ignorant posters out there. First, there is <nothing> original in the algorithms used in AlphaZero; not the use of neural networks for chess playing, the use of reinforcement training of the neural network, or use of MCTS. All of these algorithms have been applied to chess playing before. I'm sure that the AlphaZero developers implemented many enhancements and improvements in these algorithms. But original, no.

What I do think that AlphaZero accomplished is the best <integration> of these algorithms into a chess playing <system> (hardware + software), and as someone who has been responsible for system integration on many projects I know and appreciate how hard and unpredictable this can be. That and the pioneering work of efficiently using TPUs and their massive computational performance advantage to implement the best chess playing <system>.

Apr-15-19
Premium Chessgames Member
  AylerKupp: <<MrMelad> You use those arguments to "warn" people from giving too much credit to AlphaZero as if the competition between stockfish and AlphaZero was between two similar algorithms that one simply had a huge computational advantage.>

Well, I am trying to make people aware that the results of the AlphaZero and Stockfish matches are inconclusive at best <IF> what you are trying to find out what's the best approach and algorithms to implement the best chess playing engines. Whether the competition was with similar or dissimilar algorithms is besides the point, as long as the two engines are running on hardware with similar computational performance capability, regardless of hardware architectures. If that constitutes a "warning", then so be it.

Let me try a different approach to try to convince you of that. Suppose there a 100-game match was held between AlphaZero and Leela Chess Zero, both of which have similar (though not identical) algorithms and architectures and, of course, likely different implementations of those algorithms. If AlphaZero was restricted to use only one 1st generation TPU (performance estimated at ~30 TFlops) and Leela Chess Zero used a GPU server configuration with two nVIDEA RTX 2080i GPUs, (performance estimated at ~13.7 TFlops each, with an aggregate performance estimated at ~27.4 TFlops) then I would say that the performance capability of their hardware was similar. If results of the match were a near tie (like the previous TCEC's Leela Chess Zero vs. Stockfish Superfinal, even though Leela Chess Zero had a substantial computing capability over Stockfish), then I would conclude that the performance of the algorithms in AlphaZero and Leela Chess Zero was also approximately equal. I think that you would probably agree.

Now conduct another 100-match except allow AlphaZero to use four 3rd generation TPUs, each with a performance estimated at ~ 360 TFlops for an aggregate performance capability of ~1,440 TFlops and a performance computational advantage of ~ 52.5X over Leela Chess Zero. If the first match with hardware of comparable performance capability ended in a near-tie, do you have any doubts as to which engine would win the second match? And if the winner was AlphaZero by a substantial margin (of which I have no doubt), would you then conclude that the AlphaZero <algorithms> were substantially better than Leela Chess Zero's <algorithms>? I hope not. And, if you were to agree that the results of this second match were inconclusive because of AlphaZero's substantial computational capability, would you also have agreed if the first match with two approximately equal hardware in terms of computational performance capability had not taken place?

So yes, I "warn" people against giving too much credit to AlphaZero as a result of its matches against Stockfish when AlphaZero enjoyed a substantial computational performance advantage. Particularly since both DeepMind's data and Leela Chess Zero's experience with a shorter time/move (AlphaZero) and without GPU support (Leela Chess Zero) show that Stockfish, as well as many other classic engines, would defeat them both convincingly if their computational capabilities were similar.

Apr-15-19
Premium Chessgames Member
  AylerKupp: <<MrMelad> <I don't think your intentions are malicious though, I just don't think you understand how AlphaZero works, i.e., how reinforcement learning or deep learning works.>

My intentions are certainly not malicious, at least not intentionally. I present my opinions and provide data or links to data to support them. What else can I do? If others disagree with my opinions even though they agree on the data presented, or if they choose to ignore the data, there's nothing I can do about that.

As far as not understanding how Alpha Zero, reinforcement learning, or deep learning works, I might surprise you. I have read every paper on Alpha Zero, Leela Chess Zero, neural networks, reinforcement learning, and deep learning that I have been able to find and read, and I have bought two books on neural network design and deep learning that I am now studying. An expert on those subjects? Clearly not, and I have a long way to go to even be considered "knowledgeable". But understanding? I think that I have a high-level understanding, enough I think to filter bs and wishful thinking from facts.

<<MrMelad> <I know sarcasm doesn't always translate well on the internet but I sometimes forget. My apologies for using it, it's not that funny anyways.>

Well, I have the same problem and I try to avoid it but, as you know, it's easy to backslide. And quips that somehow seem funny at the time that I write and post them typically don't seem so funny after a time and after others read them. My post was not intended as a criticism of using sarcasm but just a statement of fact that, unless the use of sarcasm if very, very obvious, it's not an effective way to try to get a point across.

<<MrMelad> Here is the part where he says "You can't really compare CPU cores to GPU cores apples to apples">

What he actually said starting at ~ 00:07:00 was that CPUs and GPUs can't be compared because "they are <qualitatively different>" and the much larger number of available cores in the GPUs means that they can effectively perform a larger number of more limited operations in parallel (no surprise here!). So <CPU cores> cannot be compared to <GPU cores> <because of the kind of things that they can each do most effectively>. That's where the "apples to apples" comment was addressing, not in the context that their computational capabilities can't be compared.

Apr-17-19  MrMelad: <AylerKupp> thanks for you comments. I responded in the AlphaZero page.
May-16-19
Premium Chessgames Member
  diceman: <AylerKupp:

7. Repeat the process by increasing the depth of the subtree of the move being examined by one ply for those moves in the search tree (again, probably a small number, but I don't know the range of that number) that have the highest expected scoring percentage. Again, the engine's time management function of the chess engine determines when it's time to make a move.>

If given enough time, will all moves be looked at? (at whatever the listed ply depth is)

May-20-19  LoveThatJoker: <AylerKupp> Thanks for the engine analysis on 37. Qh8+. I too had gone with this continuation. LTJ
May-20-19  LoveThatJoker: PS. In regards to yesterday's Gazza puzzle.
May-20-19
Premium Chessgames Member
  Sally Simpson: ***

Hi AylerKupp (finally spelling your name correct instead of AlyerKupp)

You will be interested in this:

http://www.ifaamas.org/Proceedings/...

it's paper on ICAT and chess computers.

***

May-21-19
Premium Chessgames Member
  AylerKupp: <Sally Simpson> Thanks for the article on iCAT. At first glance I thought that the result should be obvious that chess players would enjoy a game against a physical opponent more than against a virtual opponent but even so it's good to have some confirmation. As you know I am a wine aficionado and I read that some enologists subjected Cabernet Sauvignon grapes to DNA testing to try to find out the grape's origin. After some analysis, at presumably a non-trivial expense, they determined that the Cabernet Sauvignon grape was the result of a cross between the Cabernet Franc grape and the Sauvignon Blanc grape. Then someone pointed out that, given the name Cabernet Sauvignon, shouldn't the answer have been obvious? Particularly since Cabernet Sauvignon, Cabernet Franc, and Sauvignon Blanc are all grown in the Bordeaux region of France.

But I do wonder if the test was too simplistic. I don't know whether the effect of the novelty factor of playing against a physical opponent was properly considered. The article didn't say (I don't think) the number of games that each participant played against the iCat. Could it be that, particularly with the 8 to 12 year old participants, the players enjoyed the game more when faced with the novelty of playing against a physical iCat rather than a virtual one? Would the results have been the same if the same group of players had played a greater number of games against each type of opponent so that the novelty factor would have worn off?

And I don't see why the participants were restricted to one kind of scenario (physical or virtual). It would seem to me that if they had a relatively large number of positions and if these positions were chosen at random, then there would have been minimum chances of duplication. Or, if by chance the same position was selected, a different position would have been substituted for it. And if the participants played both types of opponent (physical and virtual) and the order in which they played each opponent was randomly selected, then a more direct preference comparison would have been achieved. Oh well, just some thoughts.

Sep-13-19
Premium Chessgames Member
  AylerKupp: <My Experience with Dress Codes>

On my last job before my retirement I was the manager of the software department for a company. The company did have a dress code but it was never enforced or even mentioned and, software developers being what they are, my subordinates pretty much dressed as they pleased, and I never talked to them about it because, frankly, I didn't know what the company's dress code was. I did mention to them, in case they didn't think of it on their own, that if they were giving a presentation in front of our customers, they should dress appropriately, business casual as a minimum.

I personally wore casual attire except when giving presentations to our customers when I always wore a business suits. I never wore jeans except on weekends, and no shorts or sandals or flip flops.

Then one day everyone in the company received an email that the company's dress code will henceforth be enforced. It prohibited everyone from wearing shorts, jeans (even nice ones), and sandals and for the women specifically it prohibited wearing Capri pants, spaghetti straps, and outfits with bare midriffs.

Nobody knew why this email came out of the blue and one of my employees objected particularly strenuously. She said that she was a single mother, her wardrobe consisted only of jeans, and that she could not afford to get a completely new wardrobe. She finished by saying "I'm not giving up my jeans!"

My response was along the lines of "I agree with you. I'm not giving up my jeans, my Capri pants, or my spaghetti straps but by popular request I have agreed to give up my bare midriff outfits."

I of course talked to the powers-that-be to find out what triggered the email. I was told that there were some foreign potential customers visiting the company from a country where they could be offended by our employee's casual dress (and in the case of software developers <very> casual dress) and they didn't want that to happen.

I asked why this wasn't indicated in the email since it did not seem like an unreasonable request for these customer's visits and any other customer's visits from countries that might be offended by our overly casual attire, and that if notified of such visits no one would object to more relatively "formal" attire when these customers were touring our facility, and then revert to their usual attire when there were no customers on site. I was told that this would be very hard to do.

Which was absolute nonsense since <every> visitor had to indicate when they would be visiting our facility and submit their security clearances ahead of time. I always received a notice of such visits and asked my subordinates that they should straighten up their offices and the walk spaces somewhat (software developers tend to be rather messy) prior to the customers' arrival. Which told me that the powers-that-be were not that serious about the "problem" and that the issue would quickly die out. And, sure enough, it did.

Sep-13-19
Premium Chessgames Member
  AylerKupp: <My Experience with Company Rules and Regulations>

Since I'm on a roll and this is my forum I can address any subject at all (within reason) so I will. This was the incident that taught me that all rules have a way around them and, rather than try to fight an unreasonable (at least to me) rule head-on, it's easier to try to find the loopholes in the rules (there are always some) and go around them.

Many years ago before the existence of personal computers we used minicomputers which could be shared by multiple users. At another company where I was the head of software development for my department we had a centralized minicomputer in the second floor which served the needs of the software developers. Three of my subordinates were sharing an office on the first floor and they had to go upstairs to use the computer. They asked me if I could arrange to have a dumb terminal (the only ones available at the time) installed in their office since that would save them the time to up and down the stairs, increase productivity, blah, blah, blah.

That seemed like a reasonable request to me so I went to talk to the facilities manager. He told me that it was against company policy to do that because in those days they were required to install a hard conduit for the cables leading from the minicomputer to individual offices and, if my subordinates were to move, they would have wasted the money.

I was about to object when I saw the smile on the face of the facilities manager who said "However, if you install two terminals instead of just one, that would make it a terminal room that's OK as far as company policy goes and I will approve the request."

So my subordinates got two terminals in their office instead of one and were even happier. Perhaps FIDE could learn something from my experience.

Oct-25-19
Premium Chessgames Member
  AylerKupp: <Effect of "Fischer Rules" for TCEC engine tournaments (part 1 of 3)>

First some clarification re: Fischer rules, please.

1. I assume that the 9-9 clause or equivalent only applies if one of the engines participating in the Semifinals is the engine champion from the previous season.

2. Since <alexmagnus> posted data for first to win 4 games and first to win 6 games I'm assuming that the equivalent to the Fischer rules would have a 3-3 clause for a first to win 4 games and a 5-5 clause for a first to win 6 games; e.g. in the case of first to win 4 games the Superfinal would be terminated if the match score reaches 3-3 with draws not counting and one of the engines participating in the Semifinals is the engine champion from the previous season. I'll call this the (M-M) clause, where N = Number of games to win the match, draws not counting and M = N-1. In these cases the defending engine champion keeps its title but there is no match winner.

3. My assumption is that, unlike the regular TCEC Superfinal matches when the match continued even if one of the engines was mathematically eliminated from winning or drawing, if the Fischer Rules were in effect the match will be terminated once one engine reaches the score required to win. And the match would also terminate if the (M-M) clause applies.

And since there was a "gap" between being first to 6 wins and being first to 10 wins I added a being first to 8 wins category so that N = (4, 6, 8, 10). Since in 4 of the 15 seasons neither engine reached 10 wins, I hoped that this would lead to more matches having a conclusive result in the first to 8 wins category. However, in 2 of the 15 first to 8 wins neither engine reached 8 wins, this was only a minor improvement in trying to ensure a conclusive result.

Now for some general information about the TCEC matches to date:

1. So far there have been 15 seasons played, Seasons 1 and 2 and Seasons 4 through 16. As <alexmagnus> indicated Season 3 was not completed so there was no Superfinal.

2. In Seasons 1 and 2 the final match was referred to as the "Elite" match but I will refer to it as the "Superfinal" match for those two seasons for consistency. 48 games were played in each season.

3. In Seasons 4 and 5 there were 48 games played.

4. In Seasons 6 and 7 there were 64 games played.

5. In Seasons 8 and up there were 100 games played.

6. With one exception (Houdini 1.5a was used in Seasons 1 and 2) different engine versions were used in different seasons. But I will consider all Houdinis to be Houdini, all Stockfishs (Stockfishes?) to be Stockfish, etc. for the purpose in determining if the 3-3, 5-5, or 9-9 clauses applied to the defending engine champion.

Oct-25-19
Premium Chessgames Member
  AylerKupp: <Effect of "Fischer Rules" for TCEC engine tournaments (part 2 of 3)>

Some information and statistics about the chess engines involved.

Stockfish is the engine that has won the most Superfinals. This is the rundown; the second column is the number of times that the engine has won the superfinal and the second column is obviously the percentage that the engine won (out of 15 seasons):

Stockfish 7 46.7%
Houdini 4 26.7%
Komodo 3 20.0%
LeelaC0 1 6.7%
AllieStein 0 0.0%
Rybka 0 0.0%

But if you look at their success rate; i.e. how many times did the engine <win> the Superfinal once they reached it, the answer is:

Houdini 4 6 66.7%
Stockfish 7 12 58.3%
LeelaC0 1 2 50.0%
Komodo 3 7 42.9%
AllieStein 0 1 0.0%
Rybka 0 2 0.0%

The first 2 columns are as before, the third column is the number of times that the engine qualified for the Superfinal, and the fourth column is the number of times that the won the Superfinal, provided that they had qualified. So looking at it this way, it seems that Houdini was the most "efficient" in terms of winning the Superfinal once it had qualified to play in it.

Rybka's performance might be a surprise to some given its dominance many years ago. But Rybka stopped being updated shortly after the TCEC was started with it's last update, Rybka 4.1, released in Mar-2011. With chess engines, like most things in life, if you don't move with the times you will be left behind.

And, of course, we are starting to see the increased playing strength of neural network-based engines like LeelaC0 and AllieStein when supported by GPUs. It makes me wonder (no, it doesn't, but I won't go into that one here) how AlphaZero would have done if it had competed. I think that DeepMind missed an opportunity for additional good publicity by failing to enter the TCEC. And I also won't go into the computational performance advantage that these engines have by using highly parallel support processors like GPUs and TPUs.

Oct-25-19
Premium Chessgames Member
  AylerKupp: <Effect of "Fischer Rules" for TCEC engine tournaments (part 3 of 3)>

Now for a summary of the TCEC results. If this is good enough for you, you can stop after this part. If you want to see the season-by-season impact of the Fischer Rules and some TCEC statistics (Draw %, White Scoring %, etc.), keep reading until you fall sleep from boredom.

<Overall Summary>

The number of times that an (M-M) clause was applicable was in 10 or the 15 matches or 66.7% of the time.

There were potentially N = (4, 6, 8, 10) x 15 seasons = 60 total possible winning conditions.

The number of Superfinal matches when the eventual Superfinal winner won the match under the Fischer Clause = 50 / 60 (83.3%). Note: In 2 seasons neither engine won 8 games and in 4 seasons neither engine won 10 games so under those conditions no engine would have won those Superfinal matches if the Fischer Rules had been in effect.

The number of Superfinal matches drawn under the Fischer Clause (i.e. the match score was (M-M) so the match would have been terminated and therefore the defending champion would have retained its title) = 2 / 60 (3.3%)

The number of Superfinal matches lost under the Fischer Clause (the winner under the Fischer clause was not the eventual winner of the match) = 4 / 60 (6.7%)

<Win (N = 4, 6, 8, or 10) Category Summary>

In the first to 4 wins category a definitive conclusion was reached in all 15 seasons. In 14 of these (93.3%) the first engine to reach 4 wins was the same as the Superfinal winner in the season. In one of the seasons the match score reached 3-3 so the match would have been terminated and the defending champion would have retained its title. However, in that match the eventual winner was the first engine to achieve a 4-3 score.

In the first to 6 wins category a definitive conclusion was reached in all 15 seasons. In 13 of these (87.3%) the first engine to reach 6 wins was the same as the Superfinal winner in the season. In one of the seasons the match score reached 5-5 so the match would have been terminated and the defending champion would have retained its title. However, in that match the eventual winner was the first engine to achieve a 6-5 score.

In the first to 8 wins category a definitive conclusion was reached in 13 of the 15 seasons (86.7%), in the other 2 seasons neither engine managed to win 8 games. Of the 13 matches where a definitive result was reached, in one of the matches the first engine to reach 8 games (7.7%) was not the engine that eventually won the match. In no matches did the score reach 7-7 so the Fischer clause for a tie match was never applied

In the first to 10 wins category a definitive conclusion was reached in 11 of the 15 seasons (73.3%), in the other 4 seasons neither engine managed to win 8 games. Of the 11 matches where a definitive result was reached, all 11 engines were the eventual winner of the actual season.

Oct-25-19
Premium Chessgames Member
  AylerKupp: <Season-by-Season Effect of "Fischer Rules" for TCEC engine tournaments (part 1 of 5)>

Below is a season-by-season effect of the Fischer Rules when N = 4, 6, 8, and 10.

<Season 1> Houdini vs. Rybka. Houdini won with a score of 23.5 – 16.5. (M-M) clause not applicable since there is no defending champion.

First to win 4 games, draws not counting: Houdini wins in 7 games with a score of 4-0.

First to win 6 games, draws not counting: Houdini wins in 23 games with a score of 6-0.

First to win 8 games, draws not counting: Houdini wins in 31 games with a score of 8-4.

First to win 10 games, draws not counting: Houdini wins in 37 games with a score of 10-4.

After 40 games Houdini was ahead with a score of 12-5.

<Season 2> Houdini vs. Rybka. Houdini won with a score of 22 – 18. (M-M) clause applicable since Houdini was the defending champion.

First to win 4 games, draws not counting: <Rybka> wins in 9 games with a score of 4-1.

First to win 6 games, draws not counting: After 31 games the score reaches 5-5 so the match would have ended and Houdini would retain the title, Houdini won the 32nd game so it would have won the match with a score of 6-5 if the (M-M) clause had not been in effect.

First to win 8 games, draws not counting: Houdini wins in 37 games with a score of 8-5.

Neither engine scores 10 points. After 40 games Houdini was ahead 9-5 so presumably with a 4 game lead and only one more game to win it would have reached 8 or 10 wins first. And even if Rybka could have pulled a 1984 Kasparov, Houdini would have kept its title once the score reached 9-9.

<Season 4> (remember, Season 3 was not completed) Houdini vs. Stockfish. Houdini won with a score of 25 – 23. (M-M) clause applicable since Houdini was the defending champion.

First to win 4 games, draws not counting: Houdini wins in 31 games with a score of 4-2.

First to win 6 games, draws not counting: Houdini wins in 40 games with a score of 6-4.

First to win 8 or 10 games, draws not counting: No result since neither engine was able to win 8 or 10 games but with only a 2 game lead and 4 more wins needed by Houdini it's not clear to me which engine would have scored 8 or 10 points first.

Oct-25-19
Premium Chessgames Member
  AylerKupp: <Season-by-Season Effect of "Fischer Rules" for TCEC engine tournaments (part 2 of 5)>

<Season 5> Komodo vs. Stockfish. Komodo won with a score of 25 – 23. (M-M) clause is not applicable since the defending champion, Houdini, was not participating in the Semifinal.

First to win 4 games, draws not counting: <Stockfish> wins in 11 games with a score of 4-3.

First to win 6 games, draws not counting: Komodo wins in 25 games with a score of 6-4.

First to win 8 games, draws not counting: Komodo wins in 27 games with a score of 8-4.

First to win 10 games, draws not counting: Komodo wins in 48 games with a score of 10-8. Last game of the match.

<Season 6> Komodo vs. Stockfish. Stockfish won with a score of 35.5 – 28.5. (M-M) clause applicable since Komodo was the defending champion.

First to win 4 games, draws not counting: Stockfish wins in 18 games with a score of 4-0.

First to win 6 games, draws not counting: Stockfish wins in 31 games with a score of 6-2.

First to win 8 games, draws not counting: Stockfish wins in 37 games with a score of 8-3.

First to win 10 games, draws not counting: Stockfish wins in 44 games with a score of 10-3.

After 64 games Stockfish was ahead 13-6.

<Season 7> Stockfish vs. Komodo. Komodo won with a score of 33.5 – 30.5. (M-M) clause applicable since Stockfish was the defending champion.

First to win 4 games, draws not counting: Komodo wins in 26 games with a score of 4-2.

First to win 6 games, draws not counting: Komodo wins in 61 games with a score of 6-4.

First to win 8 or 10 games, draws not counting: No result since neither engine was able to win either 8 or 10 games. After 64 games Komodo was ahead 7-4 and so with a 3 game lead presumably it would have reached 8 or 10 wins first, but the fact that it needed to win 3 more games puts the projected final result somewhat in doubt.

Oct-25-19
Premium Chessgames Member
  AylerKupp: <Season-by-Season Effect of "Fischer Rules" for TCEC engine tournaments (part 3 of 5)>

<Season 8> Komodo vs. Stockfish. Komodo won with a score of 53.5 – 46.5. (M-M) clause applicable since Komodo was the defending champion.

First to win 4 games, draws not counting: Komodo wins in 46 games with a score of 4-1.

First to win 6 games, draws not counting: Komodo wins in 78 games with a score of 6-1

First to win 8 games, draws not counting: Komodo wins in 86 games with a score of 8-1

First to win 10 games, draws not counting: No result since neither engine was able to win 10 games. After 100 games Komodo was ahead 9-2 and so with a 7 game lead and only needing to win 1 more game presumably it would have reached 10 wins first.

<Season 9> Stockfish vs. Houdini. Stockfish won with a score of 54.5 – 45.5. (M-M) clause not applicable since the defending champion, Komodo, was not in the Superfinal

First to win 4 games, draws not counting: Stockfish wins in just 15 games with a score of 4-0.

First to win 6 games, draws not counting: Stockfish wins in just 21 games with a score of 6-1

First to win 8 games, draws not counting: Stockfish wins in 43 games with a score of 8-3

First to win 10 games, draws not counting: Stockfish wins in 55 games with a score of 10-3.

After 100 games Stockfish was ahead 17-8.

<Season 10> Houdini vs. Komodo. Houdini won with a score of 53 – 47. (M-M) clause not applicable since the defending champion, Stockfish, was not in the Superfinal

First to win 4 games, draws not counting: Houdini wins in just 14 games with a score of 4-0.

First to win 6 games, draws not counting: Houdini wins in 40 games with a score of 6-1

First to win 8 games, draws not counting: Houdini wins in 58 games with a score of 8-4

First to win 10 games, draws not counting: Houdini wins in 62 games with a score of 10-4.

After 100 games Houdini was ahead 15-9.

Oct-25-19
Premium Chessgames Member
  AylerKupp: <Season-by-Season Effect of "Fischer Rules" for TCEC engine tournaments (part 4 of 5)>

<Season 11> Stockfish vs. Houdini. Stockfish won with a score of 59 – 41. (M-M) clause applicable since Houdini was the defending champion.

First to win 4 games, draws not counting: Stockfish wins in just 23 games with a score of 4-0.

First to win 6 games, draws not counting: Stockfish wins in just 26 games with a score of 6-0

First to win 8 games, draws not counting: Stockfish wins in just 29 games with a score of 8-0

First to win 10 games, draws not counting: Stockfish wins in just 31 games with a score of 10-0.

After 100 games Stockfish was ahead 20-2. A dominating performance by Stockfish.

<Season 12> Stockfish vs. Komodo. Stockfish won with a score of 60 – 40. (M-M) clause applicable since Stockfish was the defending champion.

First to win 4 games, draws not counting: Stockfish wins in just 17 games with a score of 4-1.

First to win 6 games, draws not counting: Stockfish wins in just 28 games with a score of 6-1

First to win 8 games, draws not counting: Stockfish wins in just 35 games with a score of 8-3

First to win 10 games, draws not counting: Stockfish wins in just 46 games with a score of 10-5.

After 100 games Stockfish was ahead 29-9. Another dominating performance by Stockfish.

<Season 13> Stockfish vs. Komodo. Stockfish won with a score of 55 – 45. (M-M) clause applicable since Stockfish was the defending champion.

First to win 4 games, draws not counting: Stockfish wins in 29 games with a score of 4-0.

First to win 6 games, draws not counting: Stockfish wins in 53 games with a score of 6-2

First to win 8 games, draws not counting: Stockfish wins in 59 games with a score of 8-2

First to win 10 games, draws not counting: Stockfish wins in 77 games with a score of 10-4.

After 100 games Stockfish was ahead 16-6. Not as dominating as the previous two performances by Stockfish but still impressive.

Oct-25-19
Premium Chessgames Member
  AylerKupp: <Season-by-Season Effect of "Fischer Rules" for TCEC engine tournaments (part 5 of 5)>

<Season 14> Stockfish vs. Leela C0. Stockfish won with a score of 50.5 – 49.5, a real squeaker. (M-M) clause applicable since Stockfish was the defending champion.

First to win 4 games, draws not counting: The match score reached 3-3 after 17 games so the match would have been stopped and Stockfish would have retained its title due to the (M-M) (3-3) clause. Stockfish reaches a 4-3 score after 20 games so it would have won the title without the (M-M) clause.

First to win 6 games, draws not counting: After two more straight wins Stockfish wins in only 22 games with a score of 6-3.

First to win 8 games, draws not counting: <LeelaC0> wins in 53 games with a score of 8-6

First to win 10 games, draws not counting: The score reaches 9-9 after 80 games so the match would have been stopped and Stockfish would have retained the title due to the (M-M) (9-9) clause. Stockfish reaches a 10-03 score after 85 games so it would have won the title without the (M-M) clause.

After 100 games Stockfish was still ahead 10-9. So a squeaker of a match regardless of whether draws were counted or not counted.

<Season 15> Stockfish vs. LeelaC0. LeelaC0 won with a score of 53.5 – 46.5. (M-M) clause applicable since Stockfish was the defending champion.

First to win 4 games, draws not counting: LeelaC0 wins in 24 games with a score of 4-1.

First to win 6 games, draws not counting: LeelaC0 wins in 36 games with a score of 6-2

First to win 8 games, draws not counting: LeelaC0 wins in 40 games with a score of 8-3

First to win 10 games, draws not counting: LeelaC0 wins in 62 games with a score of 10-5.

After 100 games LeelaC0 was ahead 14-7. An impressive performance by LeelaC0 when supported by a GPU against the engine with the most appearances and most wins in the Superfinal.

<Season 16> Stockfish vs. AllieStein. Stockfish won with a score of 54.5 – 45.5. (M-M) clause not applicable since LeelaC0 was the defending champion.

First to win 4 games, draws not counting: Stockfish wins in 26 games with a score of 4-0.

First to win 6 games, draws not counting: Stockfish wins in 42 games with a score of 6-4

First to win 8 games, draws not counting: Stockfish wins in 66 games with a score of 8-4

First to win 10 games, draws not counting: Stockfish wins in 12 games with a score of 10-4.

After 100 games Stockfish was ahead 14-5. Still, an impressive performance by AllieStein when supported by a GPU in its first participation in the TCEC against the engine with the most appearances and most wins in the Superfinal.

Oct-25-19
Premium Chessgames Member
  AylerKupp: <Other TCEC engine tournaments statistics>

Some perhaps interesting statistics of the 16 seasons.

Total number of games played = 1,204.

<Wins, Draws, and Losses>

Number of White wins = 201 (16.7%)

Number of Black wins = 89 (7.4%)

Number of Draws = 914 (75.9%)

<Scoring Percentages>

White Scoring % = 56.9%

Black Scoring % = 43.1%

It's interesting to see that these scoring percentages are not too far off the scoring percentages for players rated 2700+ in the ChessTempo database, 54.8% for White and 45.2% for Black

<Trends>

The average White winning % has stayed roughly constant in the 15 seasons. But the Draws % as been increasing and so the Black winning % has been decreasing. So the White scoring % has been increasing somewhat and the Black scoring % has correspondingly been decreasing by the same amount. So it seems that the engines in the Superfinal have been getting relatively harder to beat as Black.

If you've gotten this far and you're still interested, you can download a spreadsheet with the season-by-season results, statistical calculations, and trend charts from http://www.mediafire.com/file/8zoht.... You will need Excel 2003 or later or a spreadsheet or viewer capable of reading Excel 2003 files.

Jump to page #    (enter # from 1 to 57)
search thread:   
< Earlier Kibitzing  · PAGE 57 OF 57 ·  Later Kibitzing>
NOTE: You need to pick a username and password to post a reply. Getting your account takes less than a minute, is totally anonymous, and 100% free—plus, it entitles you to features otherwise unavailable. Pick your username now and join the chessgames community!
If you already have an account, you should login now.
Please observe our posting guidelines:
  1. No obscene, racist, sexist, profane, raunchy, or disgusting language.
  2. No spamming, advertising, duplicate or nonsense posts.
  3. No malicious personal attacks, including cyber stalking, systematic antagonism, or gratuitous name-calling of any member Iincludinfgall Admin and Owners or any of their family, friends, associates, or business interests. If you think someone is an idiot, then provide evidence that their reasoning is invalid and/or idiotic, instead of just calling them an idiot. It's a subtle but important distinction, even in political discussions.
  4. Nothing in violation of United States law.
  5. No malicious posting of or linking to personal, private, and/or negative information (aka "doxing" or "doxxing") about any member, (including all Admin and Owners) or any of their family, friends, associates, or business interests. This includes all media: text, images, video, audio, or otherwise. Such actions will result in severe sanctions for any violators.
  6. NO TROLLING. Admin and Owners know it when they see it, and sanctions for any trolls will be significant.
  7. Any off-topic posts which distract from the primary topic of discussion are subject to removal.
  8. The use of "sock puppet" accounts to circumvent disciplinary action taken by Moderators is expressly prohibited.
  9. The use of "sock puppet" accounts in an attempt to undermine any side of a debate—or to create a false impression of consensus or support—is prohibited.
  10. All decisions with respect to deleting posts, and any subsequent discipline, are final, and occur at the sole discretion of the Moderators, Admin, and Owners.
  11. Please try to maintain a semblance of civility at all times.
Blow the Whistle See something that violates our rules? Blow the whistle and inform a Moderator.

NOTE: Keep all discussion on the topic of this page. This forum is for this specific user and nothing else. If you want to discuss chess in general, or this site, visit the Kibitzer's Café.

Messages posted by Chessgames members do not necessarily represent the views of Chessgames.com, its employees, or sponsors. All Moderator actions taken are at the sole discretion of the Admin and Owners—who will strive to act fairly and consistently at all times.
Participating Grandmasters are Not Allowed Here!

You are not logged in to chessgames.com.
If you need an account, register now;
it's quick, anonymous, and free!
If you already have an account, click here to sign-in.

View another user profile:
   


home | about | login | logout | F.A.Q. | your profile | preferences | Premium Membership | Kibitzer's Café | Biographer's Bistro | new kibitzing | chessforums | Tournament Index | Player Directory | Notable Games | World Chess Championships | Opening Explorer | Guess the Move | Game Collections | ChessBookie Game | Chessgames Challenge | Store | privacy notice | contact us


Copyright 2001-2019, Chessgames Services LLC