About Me (in case you care):
Old timer from Fischer, Reshevsky, Spassky, Petrosian, etc. era. Active while in high school and early college, but not much since. Never rated above low 1800s and highly erratic; I would occasionally beat much higher rated players and equally often lose to much lower rated players. Highly entertaining combinatorial style, everybody liked to play me since they were never sure what I was going to do (neither did I!). When facing a stronger player many try to even their chances by steering towards simple positions to be able to see what was going on. My philosophy in those situations was to try to even the chances by complicating the game to the extent that neither I nor the stronger player would be able to see what was going on! Alas, this approach no longer works in the computer age. Needless to say, my favorite all-time player is Tal.
And Tal summarized my philosophy when faced with a stronger player far better than I ever could, and much more eloquently: "You must take your opponent into a deep dark forest where 2 + 2 = 5, and the path leading out is only wide enough for one."
I also have a computer background and have been following with interest the development in computer chess since the days when computers couldn't always recognize illegal moves and a patzer like me could beat them with ease. Now it’s me that can’t always recognize illegal moves and any chess program can beat me with ease.
But after about 9 years (a lifetime in computer-related activities) of playing computer-assisted chess, I think I have learned a thing or two about the subject. I have conceitedly defined "AylerKupp's Corollary to Murphy's Law" (AKC2ML) as follows:
"If you use your engine to analyze a position to a search depth=N, your opponent's killer move (the move that will refute your entire analysis) will be found at search depth=N+1, regardless of the value you choose for N."
I’m also a food and wine enthusiast. Some of my favorites are German wines (along with French, Italian, US, New Zealand, Australia, Argentina, Spain, ... well, you probably get the idea). One of my early favorites were wines from the Ayler Kupp vineyard in the Saar region, hence my user name. Here is a link to a picture of the village of Ayl with a portion of the Kupp vineyard on the left: http://en.wikipedia.org/wiki/File:A...
You can send me an e-mail whenever you'd like to aylerkupp gmail.com.
And check out a picture of me with my "partner", Rybka (Aylerkupp / Rybka) from the CG.com Masters - Machines Invitational (2011). No, I won't tell you which one is me.
I have become interested in the increase in top player ratings since the mid-1980s and whether this represented a true increase in player strength (and if so, why) or if it is simply a consequence of a larger chess population from which ratings are derived. So I've opened up my forum for discussions on this subject.
I have updated the list that I initially completed in Mar-2013 with the FIDE year-end rating list through 2019 (published in Jan-2020), and you can download the complete data from http://www.mediafire.com/file/kf54b.... It is quite large (~ 248 MB) and to open it you will need Excel 2007 or later version or a compatible spreadsheet since several of the later tabs contain more than 65,536 rows.
The spreadsheet also contains several charts and summary information. If you are only interested in that and not the actual rating lists, you can download a much smaller (~ 1 MB) spreadsheet containing the charts and summary information from http://www.mediafire.com/file/8kss1.... You can open this file with a pre-Excel 2007 version or a compatible spreadsheet.
FWIW, after looking at the data I think that ratings inflation, which I define to be the unwarranted increase in ratings not necessarily accompanied by a corresponding increase in actual playing strength, was real, but it was a slow process. I refer to this as my "Bottom Feeder" hypothesis and it goes something like this:
1. Initially (late 1960s and 1970s) the ratings for the strongest players were fairly constant.
2. In the 1980s the number of rated players began to increase exponentially, and they entered the FIDE-rated chess playing population mostly at the lower rating levels. Also, starting in 1992, FIDE began to periodically lower the rating floor (the lowest rating for which players would be rated by FIDE) from 2200 to the current 1000 in 2012. This resulted in an even greater increase in the number of rated players. And the ratings of those newly-rated players may have been higher than they should have been, given that they were calculated using a high K-factor.
3. The ratings of the stronger of these players increased as a result of playing these weaker players, but their ratings were not sufficiently high to play in tournaments, other than open tournaments, where they would meet middle and high rated players.
4. Eventually they did. The ratings of the middle rated players then increased as a result of beating the lower rated players, and the ratings of the lower rated players then leveled out and even started to decline. You can see this effect in the 'Inflation Charts' tab, "Rating Inflation: Nth Player" chart, for the 1500th to 5000th rated player.
5. Once the middle rated players increased their ratings sufficiently, they began to meet the strongest players. And the cycle repeated itself. The ratings of the middle players began to level out and might now be ready to start a decrease. You can see this effect in the same chart for the 100th to 1000th rated player.
6. The ratings of the strongest players, long stable, began to increase as a result of beating the middle rated players. And, because they are at the top of the food chain, their ratings, at leas initially, continued to climb. I think that they have finally leveled out at ALL rating levels, including the top level, based on their trends for the last several years.
You can see in the chart that the rating increase, leveling off, and decline first starts with the lowest ranking players, then through the middle ranking players, and finally affects the top ranked players. As of today the average ratings of ALL the players, including the average of the Top-10 rated players, has been fairly constant since 2015.
It's not precise, it's not 100% consistent, but it certainly seems evident. And the process took decades so it's not easy to see unless you look at all the years and many ranked levels.
Of course, this is just a hypothesis and the chart may look very different 20 years from now. But, at least on the surface, it doesn't sound unreasonable to me.
But looking at the data through 2019 it is evident that the era of ratings inflation IS over, unless FIDE once more lowers the rating floor and a flood of new and previously unrated players enters the rating pool. The previous year's trends have either continued or accelerated; the rating for every ranking category has either flattened out or has started to decline as evidenced by the trendlines.
Chess Engine Non-Determinism
I've discussed chess engine non-determinism many times. If you run an analysis of a position multiple times, with the same engine, the same computer, and to the same search depth, you will get different results. Not MAY, WILL. Guaranteed. Similar results were reported by others.
I had a chance to run a slightly more rigorous test and described the results starting here: US Championship (2017) (kibitz #633). I had 3 different engines (Houdini 4, Komodo 10, and Stockfish 8 analyze the position in W So vs Onischuk, 2017 after 13...Bxd4, a highly complex tactical position. I made 12 runs with each engine; 3 each with threads=1, 2, 3, and 4 on my 32-bit 4-core computer with 4 MB RAM and MPV=3. The results were consistent with each engine:
(a) With threads=1 (using a single core) the results of all 3 engines were deterministic. Each of the 3 engines on each of the analyses selected the same top 3 moves for each engine, with the same evaluations, and obviously the same move rankings.
(b) With threads =2, 3, and 4 (using 2, 3, and 4 cores) none of the engines showed deterministic behavior. Each of the 3 engines on each of the analyses occasionally selected different analyses for the same engine, with different evaluations, and different move rankings.
I've read that the technical reason for the non-deterministic behavior is the high sensitivity of the alpha-beta algorithms that all the top engines use to move ordering in their search tree, and the variation of this move ordering using multi-threaded operation when each of the threads gets interrupted by higher-priority system processes. I have not had the chance to verify this, but there is no disputing the results.
What's the big deal? Well if the same engine gives different results each time it runs, how can you determine what's the real "best" move? Never mind that different engines or relatively equal strength (as determined by their ratings) give different evaluations and move rankings for their top 3 move and that the evaluations may differ as a function of the search depth.
Since I believe in the need to run analyses of a given position using more than one engine and then aggregating the results to try to reach a more accurate assessment of a position, I typically have run sequential analyses of the same position using 4 threads and a hash table = 1,024 MB. But since I typically run 3 engines, I found it to be more efficient to run analyses using all 3 engines concurrently, each with a single thread and a hash table = 256 MB (to prevent swapping to disk). Yes, running with a single thread runs at 1/2 the speed of running with 4 threads but then running the 3 engines sequentially requires 3X the time and running the 3 engines concurrently requires only 2X the time for a 50% reduction in the time to run all 3 analyses to the same depth, and resolving the non-determinism issues.
So, if you typically run analyses of the same position with 3 engines, consider running them concurrently with threads=1 rather than sequentially with threads=4. You'll get deterministic results in less total time.
A Note on Chess Engine Evaluations
All engines provide different evaluations of the next "best" move, sometimes significantly different. For example, Stockfish's evaluations tend to be higher than other top engines and Houdini's evaluations tend to be lower. This could be because Stockfish typically reaches greater search depths than the other top engines in the same amount of time, and Houdini's typically reaches lower search than the other top engines. Or it could be for other reasons.
If we are analyzing a position we typically want to use the "best" engine as "measured" by its rating,, and that's currently (Mar-2018) Stockfish 10 for "classic" chess engines (I'm deliberately excluding AlphaZero and Leela Chess Zero because they use a different move/search tree branch evaluation approach and the best versions of them use either TPU or GPU support to enhance their calculation capability and therefore are not directly comparable), and it's higher rating has been achieved in engine vs. engine tournaments such as CCRL and CEGT. But the "best" engine as determined by playing head-to-head games is not necessarily the best engine for <analysis> since in analysis we not only want to know the best moves from a given position but we want an accurate <evaluation> of the position. Specifically, we want an accurate evaluation of the position in <absolute> terms in order to determine whether one side has a likely winning advantage (generally an absolute evaluation > [ ±2.00] or 2 pawns), a significant advantage (generally an absolute evaluation in the range [ ±1.00] to [ ±1.99], a slight advantage (generally an absolute evaluation in the range [ ±0.50] to [ ±0.99], of if the position is approximately equal (generally an absolute evaluation in the range [-0.49 to +0.49]).
But when playing a game an accurate <absolute> evaluation is irrelevant, what counts is an accurate <relative> evaluation. This is because all chess engines using the minimax algorithm to determine the best move (assuming best play by both sides) do that by a series of pairwise comparisons between two moves. So if an engine is trying to determine which of 2 moves, A and B is better, it doesn't matter if their evaluations are [+12.00] or [+11.00], [+1.20] or [+1.10], or [+0.12] or [+0.11], it will always select move A as the better move and consider that branch in the search tree to be the better line. So multiplying 2 evaluations by a fixed constant or adding a fixed constant to 2 evaluations has no effect in the engine determining which of the 2 moves is better. But clearly, evaluations of [+12.00], [+1.20], or [+0.12] will give the analyst much different impressions of the position.
In practice the discrepancies in evaluations between several engines is not that drastic, but I suggest that you don't assume that Stockfish's <absolute> evaluations are the most accurate just because it is (currently) the best "classical" game-playing engine (i.e. not using GPU or TPU support) or because it reached the greater search depth in a given amount of time.
In response to a question by <john barleycorn> I looked into the 16 TCEC Superfinal matches to date, summarized the results, provided season-by-season summaries, and compiled some statistics. The objective was to see if the "Fischer Rules" as proposed for Karpov - Fischer World Championship Match (1975) (winner first to win 10 games with draws not counting, match would be terminated if score reached 9-9 with no match winner and the champion retaining his title). User <alexmagnus> provided some statistics for the situations of the first to win 4 games and first to win 10 games, and I added some statistics for the situation of the first to win 8 games.
You can see the information starting at AylerKupp chessforum (kibitz #1537) below. You can download a spreadsheet with the season-by-season results, statistical calculations, and trend charts from http://www.mediafire.com/file/8zoht.... You will need Excel 2003 or later or a spreadsheet or viewer capable of reading Excel 2003 files.
Any comments, suggestions, criticisms, etc. are both welcomed and encouraged.