About Me (in case you care):
Old timer from Fischer, Reshevsky, Spassky, Petrosian, etc. era. Active while in high school and early college, but not much since. Never rated above low 1800s and highly erratic; I would occasionally beat much higher rated players and equally often lose to much lower rated players. Highly entertaining combinatorial style, everybody liked to play me since they were never sure what I was going to do (neither did I!). When facing a stronger player many try to even their chances by steering towards simple positions to be able to see what was going on. My philosophy in those situations was to try to even the chances by complicating the game to the extent that neither I nor the stronger player would be able to see what was going on! Alas, this approach no longer works in the computer age. And, needless to say, my favorite all-time player is Tal.
I also have a computer background and have been following with interest the development in computer chess since the days when computers couldn't always recognize illegal moves and a patzer like me could beat them with ease. Now it’s me that can’t always recognize illegal moves and any chess program can beat me with ease.
But after about 8 years (a lifetime in computer-related activities) of playing computer-assisted chess, I think I have learned a thing or two about the subject. I have conceitedly defined "AylerKupp's corollary to Murphy's Law" (AKC2ML) as follows:
"If you use your engine to analyze a position to a search depth=N, your opponent's killer move (the move that will refute your entire analysis) will be found at search depth=N+1, regardless of the value you choose for N."
I’m also a food and wine enthusiast. Some of my favorites are German wines (along with French, Italian, US, New Zealand, Australia, Argentina, Spain, ... well, you probably get the idea). One of my early favorites were wines from the Ayler Kupp vineyard in the Saar region, hence my user name. Here is a link to a picture of the village of Ayl with a portion of the Kupp vineyard on the left: http://en.wikipedia.org/wiki/File:A...
You can send me an e-mail whenever you'd like to aylerkupp gmail.com.
And check out a picture of me with my "partner", Rybka (Aylerkupp / Rybka) from the CG.com Masters - Machines Invitational (2011). No, I won't tell you which one is me.
I have become interested in the increase in top player ratings since the mid-1980s and whether this represents a true increase in player strength (and if so, why) or if it is simply a consequence of a larger chess population from which ratings are derived. So I've opened up my forum for discussions on this subject.
I have updated the list that I initially completed in Mar-2013 with the FIDE rating list through 2018 (published in Jan-2019), and you can download the complete data from https://www.mediafire.com/file/g89w.... It is quite large (~ 213 MB) and to open it you will need Excel 2007 or later version or a compatible spreadsheet since several of the later tabs contain more than 65,536 rows.
The spreadsheet also contains several charts and summary information. If you are only interested in that and not the actual rating lists, you can download a much smaller (~ 1 MB) spreadsheet containing the charts and summary information from https://www.mediafire.com/file/m5nk.... You can open this file with a pre-Excel 2007 version or a compatible spreadsheet.
FWIW, after looking at the data I think that ratings inflation, which I define to be the unwarranted increase in ratings not necessarily accompanied by a corresponding increase in playing strength, is real, but it is a slow process. I refer to this as my "Bottom Feeder" hypothesis and it goes something like this:
1. Initially (late 1960s and 1970s) the ratings for the strongest players were fairly constant.
2. In the 1980s the number of rated players began to increase exponentially, and they entered the FIDE-rated chess playing population mostly at the lower rating levels. Also, starting in 1992, FIDE began to periodically lower the rating floor (the lowest rating for which players would be rated by FIDE) from 2200 to the current 1000 in 2012. This resulted in an even greater increase in the number of rated players. And the ratings of those newly-rated players may have been higher than they should have been, given that they were calculated using a high K-factor.
3. The ratings of the stronger of these players increased as a result of playing these weaker players, but their ratings were not sufficiently high to play in tournaments, other than open tournaments, where they would meet middle and high rated players.
4. Eventually they did. The ratings of the middle rated players then increased as a result of beating the lower rated players, and the ratings of the lower rated players then leveled out and even started to decline. You can see this effect in the 'Inflation Charts' tab, "Rating Inflation: Nth Player" chart, for the 1500th to 5000th rated player.
5. Once the middle rated players increased their ratings sufficiently, they began to meet the strongest players. And the cycle repeated itself. The ratings of the middle players began to level out and might now be ready to start a decrease. You can see this effect in the same chart for the 100th to 1000th rated player.
6. The ratings of the strongest players, long stable, began to increase as a result of beating the middle rated players. And, because they are at the top of the food chain, their ratings, at leas initially, continued to climb. I think that they will eventually level out and may have already done that except for possibly the very highest rated players (rated among the top 50) but if this hypothesis is true there is no force to drive them down so they will now stay relatively constant like the pre-1986 10th rated player and the pre-1981 50th rated player. When this leveling out will take place, if it does, and at what level, I have no idea. But a look at the 2017 ratings data indicates that, indeed, it has already started, maybe even among the top 10 rated players.
You can see in the chart that the rating increase, leveling off, and decline first starts with the lowest ranking players, then through the middle ranking players, and finally affects the top ranked players. As of today the average ratings of ALL the players, including the average of the Top-10 rated players, has been fairly constant since 2014.
It's not precise, it's not 100% consistent, but it certainly seems evident. And the process takes decades so it's not easy to see unless you look at all the years and many ranked levels.
Of course, this is just a hypothesis and the chart may look very different 20 years from now. But, at least on the surface, it doesn't sound unreasonable to me.
But looking at the data through 2018 it is even more evident that the era of ratings inflation appears to be over, unless FIDE once more lowers the rating floor and a flood of new and unrated players enters the rating pool. The previous year's trends have either continued or accelerated; the rating for every ranking category has either flattened out or has started to decline as evidenced by the trendlines.
Chess Engine Non-Determinism
I've discussed chess engine non-determinism many times. If you run an analysis of a position multiple times, with the same engine, the same computer, and to the same search depth, you will get different results. Not MAY, WILL. Guaranteed. Similar results were reported by others.
I had a chance to run a slightly more rigorous test and described the results starting here: US Championship (2017) (kibitz #633). I had 3 different engines (Houdini 4, Komodo 10, and Stockfish 8 analyze the position in W So vs Onischuk, 2017 after 13...Bxd4, a highly complex tactical position. I made 12 runs with each engine; 3 each with threads=1, 2, 3, and 4 on my 32-bit 4-core computer with 4 MB RAM and MPV=3. The results were consistent with each engine:
(a) With threads=1 (using a single core) the results of all 3 engines were deterministic. Each of the 3 engines on each of the analyses selected the same top 3 moves for each engine, with the same evaluations, and obviously the same move rankings.
(b) With threads =2, 3, and 4 (using 2, 3, and 4 cores) none of the engines showed deterministic behavior. Each of the 3 engines on each of the analyses occasionally selected different analyses for the same engine, with different evaluations, and different move rankings.
I've read that the technical reason for the non-deterministic behavior is the high sensitivity of the alpha-beta algorithms that all the top engines use to move ordering in their search tree, and the variation of this move ordering using multi-threaded operation when each of the threads gets interrupted by higher-priority system processes. I have not had the chance to verify this, but there is no disputing the results.
What's the big deal? Well if the same engine gives different results each time it runs, how can you determine what's the real "best" move? Never mind that different engines or relatively equal strength (as determined by their ratings) give different evaluations and move rankings for their top 3 move and that the evaluations may differ as a function of the search depth.
Since I believe in the need to run analyses of a given position using more than one engine and then aggregating the results to try to reach a more accurate assessment of a position, I typically have run sequential analyses of the same position using 4 threads and a hash table = 1,024 MB. But since I typically run 3 engines, I found it to be more efficient to run analyses using all 3 engines concurrently, each with a single thread and a hash table = 256 MB (to prevent swapping to disk). Yes, running with a single thread runs at 1/2 the speed of running with 4 threads but then running the 3 engines sequentially requires 3X the time and running the 3 engines concurrently requires only 2X the time for a 50% reduction in the time to run all 3 analyses to the same depth, and resolving the non-determinism issues.
So, if you typically run analyses of the same position with 3 engines, consider running them concurrently with threads=1 rather than sequentially with threads=4. You'll get deterministic results in less total time.
A Note on Chess Engine Evaluations
All engines provide different evaluations of the next "best" move, sometimes significantly different. For example, Stockfish's evaluations tend to be higher than other top engines and Houdini's evaluations tend to be lower. This could be because Stockfish typically reaches greater search depths than the other top engines in the same amount of time, and Houdini's typically reaches lower search than the other top engines. Or it could be for other reasons.
If we are analyzing a position we typically want to use the "best" engine as "measured" by its rating,, and that's currently (Mar-2018) Stockfish 10 for "classic" chess engines (I'm deliberately excluding AlphaZero and Leela Chess Zero because they use a different move/search tree branch evaluation approach and the best versions of them use either TPU or GPU support to enhance their calculation capability and therefore are not directly comparable), and it's higher rating has been achieved in engine vs. engine tournaments such as CCRL and CEGT. But the "best" engine as determined by playing head-to-head games is not necessarily the best engine for <analysis> since in analysis we not only want to know the best moves from a given position but we want an accurate <evaluation> of the position. Specifically, we want an accurate evaluation of the position in <absolute> terms in order to determine whether one side has a likely winning advantage (generally an absolute evaluation > [ ±2.00] or 2 pawns), a significant advantage (generally an absolute evaluation in the range [ ±1.00] to [ ±1.99], a slight advantage (generally an absolute evaluation in the range [ ±0.50] to [ ±0.99], of if the position is approximately equal (generally an absolute evaluation in the range [-0.49 to +0.49]).
But when playing a game an accurate <absolute> evaluation is irrelevant, what counts is an accurate <relative> evaluation. This is because all chess engines using the minimax algorithm to determine the best move (assuming best play by both sides) do that by a series of pairwise comparisons between two moves. So if an engine is trying to determine which of 2 moves, A and B is better, it doesn't matter if their evaluations are [+12.00] or [+11.00], [+1.20] or [+1.10], or [+0.12] or [+0.11], it will always select move A as the better move and consider that branch in the search tree to be the better line. So multiplying 2 evaluations by a fixed constant or adding a fixed constant to 2 evaluations has no effect in the engine determining which of the 2 moves is better. But clearly, evaluations of [+12.00], [+1.20], or [+0.12] will give the analyst much different impressions of the position.
In practice the discrepancies in evaluations between several engines is not that drastic, but I suggest that you don't assume that Stockfish's <absolute> evaluations are the most accurate just because it is (currently) the best "classical" game-playing engine (i.e. not using GPU or TPU support) or because it reached the greater search depth in a given amount of time.
Any comments, suggestions, criticisms, etc. are both welcomed and encouraged.