chessgames.com
Members · Prefs · Collections · Openings · Endgames · Sacrifices · History · Search Kibitzing · Kibitzer's Café · Chessforums · Tournament Index · Players · Kibitzing

 
Chessgames.com User Profile Chessforum
AylerKupp
Member since Dec-31-08 · Last seen Dec-09-16
About Me (in case you care):

Old timer from Fischer, Reshevky, Spassky, Petrosian, etc. era. Active while in high school and early college, but not much since. Never rated above low 1800s and highly erratic; I would occasionally beat much higher rated players and equally often lose to much lower rated players. Highly entertaining combinatorial style, everybody liked to play me since they were never sure what I was going to do (neither did I!). When facing a stronger player many try to even their chances by steering towards simple positions to be able to see what was going on. My philosophy in those situations was to try to even the chances by complicating the game to the extent that neither I nor the stronger player would be able to see what was going on! Alas, this approach no longer works in the computer age. And, needless to say, my favorite all-time player is Tal.

I also have a computer background and have been following with interest the development in computer chess since the days when computers couldn't always recognize illegal moves and a patzer like me could beat them with ease. Now it’s me that can’t always recognize illegal moves and any chess program can beat me with ease.

But after about 4 years (a lifetime in computer-related activities) of playing computer-assisted chess, I think I have learned a thing or two about the subject. I have conceitedly defined "AylerKupp's corollary to Murphy's Law" (AKC2ML) as follows:

"If you use your engine to analyze a position to a search depth=N, your opponent's killer move (the move that will refute your entire analysis) will be found at search depth=N+1, regardless of the value you choose for N."

I’m also a food and wine enthusiast. Some of my favorites are German wines (along with French, Italian, US, New Zealand, Australia, Argentina, Spain, ... well, you probably get the idea). One of my early favorites were wines from the Ayler Kupp vineyard in the Saar region, hence my user name. Here is a link to a picture of the village of Ayl with a portion of the Kupp vineyard on the left: http://en.wikipedia.org/wiki/File:A...

You can send me an e-mail whenever you'd like to aylerkupp(at)gmail.com.

And check out a picture of me with my "partner", Rybka (Aylerkupp / Rybka) from the CG.com Masters - Machines Invitational (2011). No, I won't tell you which one is me.

-------------------

Analysis Tree Spreadsheet (ATSS).

The ATSS is a spreadsheet developed to track the analyses posted by team members in various on-line games (XXXX vs. The World, Team White vs. Team Black, etc.). It is a poor man's database which provides some tools to help organize and find analyses.

I'm in the process of developing a series of tutorials on how to use it and related information. The tutorials are spread all over this forum, so here's a list of the tutorials developed to date and links to them:

Overview: AylerKupp chessforum (kibitz #843)

Minimax algorithm: AylerKupp chessforum (kibitz #861)

Principal Variation: AylerKupp chessforum (kibitz #862)

Finding desired moves: AylerKupp chessforum (kibitz #863)

Average Move Evaluation Calculator (AMEC): AylerKupp chessforum (kibitz #876)

-------------------

ATSS Analysis Viewer

I added a capability to the Analysis Tree Spreadsheet (ATSS) to display each analysis in PGN-viewer style. You can read a brief summary of its capabilities here AylerKupp chessforum (kibitz #1044) and download a beta version for evaluation.

-------------------

Chess Engine Evaluation Project

The Chess Engine Evaluation Project was an attempt to evaluate different engines’ performance in solving the “insane” Sunday puzzles with the following goals:

(1) Determining whether various engines were capable of solving the Sunday puzzles within a reasonable amount of time, how long it took them to do so, and what search depth they required.

(2) Classifying the puzzles as Easy, Medium, or Hard from the perspective of how many engines successfully solved the puzzle, and to determine whether any one engine(s) excelled at the Hard problems.

(3) Classifying the puzzle positions as Open, Semi-Open, or Closed and determine whether any engine excelled at one type of positions that other engines did not.

(4) Classifying the puzzle position as characteristic of the opening, middle game, or end game and determine which engines excelled at one phase of the game vs. another.

(5) Comparing the evals of the various engines to see whether one engine tends to generate higher or lower evals than other engines for the same position. If anybody is interested in participating in the restarted project, either post

Unfortunately I had to stop work on the project. It simply took more time that I had available to run analyses on the many text positions for each of the engines. And, it seems that each time that I had reasonably categorized an engine, a new version was released making the results obtained with the previous version obsolete. Oh well.

-------------------

Ratings Inflation

I have recently become interested in the increase in top player ratings since the mid-1980s and whether this represents a true increase in player strength (and if so, why) or if it is simply a consequence of a larger chess population from which ratings are derived. So I've opened up my forum for discussions on this subject.

I have updated the list that I initially completed in Mar-2013 with the FIDE rating list through 2014 (published in Jan-2015), and you can download the complete data from http://www.mediafire.com/view/tsyci... It is quite large (135 MB) and to open it you will need Excel 2007 or later version or a compatible spreadsheet since several of the later tabs contain more than 65,536 rows.

The spreadsheet also contains several charts and summary information. If you are only interested in that and not the actual rating lists, you can download a much smaller (813 KB) spreadsheet containing the charts and summary information from http://www.mediafire.com/view/2b3id...(summary).xls. You can open this file with a pre-Excel 2007 version or a compatible spreadsheet.

FWIW, after looking at the data I think that ratings inflation, which I define to be the unwarranted increase in ratings not necessarily accompanied by a corresponding increase in playing strength, is real, but it is a slow process. I refer to this as my "Bottom Feeder" hypothesis and it goes something like this:

1. Initially (late 1960s and 1970s) the ratings for the strongest players were fairly constant.

2. In the 1980s the number of rated players began to increase exponentially, and they entered the FIDE-rated chess playing population mostly at the lower rating levels. The ratings of the stronger of these players increased as a result of playing weaker players, but their ratings were not sufficiently high to play in tournaments, other than open tournaments, where they would meet middle and high rated players.

3. Eventually they did. The ratings of the middle rated players then increased as a result of beating the lower rated players, and the ratings of the lower rated players then leveled out and even started to decline. You can see this effect in the 'Inflation Charts' tab, "Rating Inflation: Nth Player" chart, for the 1500th to 5000th rated player.

4. Once the middle rated players increased their ratings sufficiently, they began to meet the strongest players. And the cycle repeated itself. The ratings of the middle players began to level out and might now be ready to start a decrease. You can see this effect in the same chart for the 100th to 1000th rated player.

5. The ratings of the strongest players, long stable, began to increase as a result of beating the middle rated players. And, because they are at the top of the food chain, their ratings, at least so far, continue to climb. I think that they will eventually level out but if this hypothesis is true there is no force to drive them down so they will stay relatively constant like the pre-1986 10th rated player and the pre-1981 50th rated player. When this leveling out will take place, if it does, and at what level, I have no idea. But a look at the 2013 ratings data indicates that, indeed, it may have already started.

You can see in the chart that the rating increase, leveling off, and decline first starts with the lowest ranking players, then through the middle ranking players, and finally affects the top ranked players. It's not precise, it's not 100% consistent, but it certainly seems evident. And the process takes decades so it's not easy to see unless you look at all the years and many ranked levels.

Of course, this is just a hypothesis and the chart may look very different 20 years from now. But, at least on the surface, it doesn't sound unreasonable to me.

But looking at the data through 2015 it is even more evident that the era of ratings inflation appears to be over. The previous year's trends have either continued or accelerated; the rating for every ranking category, except for possibly the 10th ranked player (a possible trend is unclear), has either flattened out or has started to decline as evidenced by the trendlines.

Any comments, suggestions, criticisms, etc. are both welcomed and encouraged.

-------------------

Chessgames.com Full Member

   AylerKupp has kibitzed 9549 times to chessgames   [more...]
   Dec-08-16 Robert James Fischer (replies)
 
AylerKupp: <<keypusher> The only way you don't get beat is you quit.> That reminds me of the ending to the movie "War Games". After asked to play "Global Thermonuclear War" the computer says: "Interesting game. The only way to win is not to play." And then, of course (just to ...
 
   Dec-06-16 I Bahgat vs Rodolfo Cardoso, 1957 (replies)
 
AylerKupp: Great puzzle/spoiler. But I'm not sure that I would label it "easy". That's just sour grapes because, sure enough, I fell for 24.Ng5, not seeing 24...Qxg2+ or 25...Nb4+.
 
   Dec-06-16 Stockfish (Computer) (replies)
 
AylerKupp: <zanzibar> Oh, I agree about restarting the engine at (n-m)-ply, that's the basic forward sliding approach. And, if you make m = n -1 than that's like having an engine (or different engines) play a game or doing forward sliding one move at a time. In a recent Team game I ...
 
   Dec-01-16 Carlsen vs Karjakin, 2016 (replies)
 
AylerKupp: <<chessalem> but so would have Wesley!> And so would have thousands watching on line. But what does it matter? None of them were participants in this game and those two are the only ones that count.
 
   Nov-30-16 Carlsen vs Karjakin, 2016 (replies)
 
AylerKupp: Carlsen will not be pleased with the outcome.
 
   Nov-30-16 Carlsen - Karjakin World Championship (2016) (replies)
 
AylerKupp: Speaking of Armageddon, does anyone know how the 5 min / 4 min time control ratio was determined? Ideally I think it should be the time control ratio that results in both players winning 50% of the time with Black having draw odds, but I have not found any databases of Armageddon ...
 
   Nov-30-16 Karjakin vs Carlsen, 2016 (replies)
 
AylerKupp: <<keypusher> It was a very good match overall, but game #12 left a bad taste in my mouth.> I think that the players needed a rest day in order to recover from the previous day's rest day.
 
   Nov-28-16 Carlsen vs Karjakin, 2016 (replies)
 
AylerKupp: I'm starting to like the idea of playing an odd-numbered game match with every game an Armageddon game played at classical time controls. Just think of it, a definite winner and NO draws – guaranteed! Of course, the time control needs to be adjusted (perhaps by playing a ...
 
   Nov-26-16 Karjakin vs Carlsen, 2016 (replies)
 
AylerKupp: <OhioChessFan> Yes, your line is better than mine. But I remembered that I was not engineless, just WiFiless, I loaded your suggested position into Stockfish 7, and it couldn't find anything better at d=30 than 31...b4, 31...Rxd3, and 31...bxc4, all evaluated at [+0.12] , ...
 
   Nov-26-16 AylerKupp chessforum (replies)
 
AylerKupp: <Tiggler> Sorry, but I've been busy with several personal obligations and I haven't had much time to devote to chess. And what little time I've had has been devoted to following the Carlsen - Karjakin match. One obvious comment, players don't necessarily perform according to
 
(replies) indicates a reply to the comment.

De Gustibus Non Disputandum Est

Kibitzer's Corner
< Earlier Kibitzing  · PAGE 53 OF 53 ·  Later Kibitzing>
Oct-25-16
Premium Chessgames Member
  AylerKupp: <diceman> Sorry to take so long to get back to you. The answer to your first question is no, I do not recall ever seeing a position like this one. Black's problem was to find ways to continually check White until it could either deliver mate or get the White king and queen on the c1-h6 diagonal where it could skewer White's king and queen.

I had Stockfish 7, Komodo 10, and Houdini 4 analyze it and Stockfish 7 found a mate in 18 at d=33 following 3:09 minutes of calculation in my oldish computer after 1...Qd4+ 2.Qe3 (Forced. If 2.Kf1 then 2...d2+ 3.Ne2 Qg1+ (cute!) 4.Kxg1 dxe1=Q#. And if 2.Kf3 then 2...Bd5+ 3.Ne4 Qxe4+ 4.Kf2 Qe2+ 5.Kg3 Qxe1+ 6.Kh3 Be6+ 7.g4 Qf1+ 8.Kg3 Qg1+ 9.Kf3 Bd5+ 10.Kf4 (forced) 10...Qc1+ 11.Kg3 Qxh6 12.e6 Qe3+ 13.Kh4 Qf2+ 14.Kg5 h6+ 15.Kxh6 Qh4#) 2...Qxb2+ 3.Kg1 d2 4.Nf1 d1=Q 5.Qh6 (hope springs eternal!) 5...Qxg2+ (one queen is enough to mate) 6.Kxg2 Qe2+ 7.Kg3 Qxe1+ 8.Kf4 Qxf1+ 9.Kg3 Qe1+ 10.Kg2 Bd5+ 11.Kh3 Qf1+ 12.Kg3 Qg2+ 13.Kf4 Qd2+ (and, of course, now it's over) 14.Kg3 Qxh6 15.Kf2 Qd2+ 16.Kg3 g5 17.Kg4 Qf4+ 18.Kh5 Qh4#

Komodo had more trouble. Although it found 1...Qd4+ immediately and evaluated the position as effectively win for Black at [-14.98], it did not find a mate until d=28 [-250.00] and 13:09 minutes of calculation. It then "lost" the mate at d=29 after 13:14 minutes of calculation with an eval of "only" [-40.96] but then found almost the same mate as Stockfish at d=30 but only after 1:01:00 hours of calculation: 1...Qd4+ 2.Qe3 Qxb2+ 3.Kg1 d2 4.Nf1 d1=Q 5.Qh6 Qxg2+ 6.Kxg2 Qe2+ 7.Kg3 Qxe1+ 8.Kf4 Qxf1+ 9.Kg3 Qe1+ 10.Kg2 Bd5+ 11.Kh3 Qf1+ 12.Kg3 Qg2+ 13.Kf4 Qf2+ 14.Kg5 Qe3+ 15.Kh4 Qxh6+ 16.Kg3 Qe3+ 17.Kg4 Be6+ 18.Kh4 Qf4#. No, I have no idea why it took Komodo almost 47 minutes to go from d=29 to d=30, but at least it found the apparently shortest mate.

Houdini's performance was in between Stockfish and Komodo. It first found a mate for Black in 26 moves at d=26 after 13:58 minutes of calculation and found yet another slightly different mate in 18 at d=27 after 19:35 minutes of calculation: 1...Qd4+ 2.Qe3 Qxb2+ 3.Kg1 d2 4.Nf1 d1=Q 5.Qh6 Qxg2+ 6.Kxg2 Qe2+ 7.Kg3 Qxe1+ 8.Kf4 Qxf1+ 9.Kg3 Qe1+ 10.Kg2 Bd5+ 11.Kh3 Qf1+ 12.Kg3 Qf3+ 13.Kh4 Qf2+ 14.Kg5 Qe3+ 15.Kh4 Qxh6+ 16.Kg3 Rfc8 17.h3 Qe3+ 18.Kh4 Qf4#

I don't think that this would be a good test for modern engines because most moves are forced even though the time they required to find the apparently shortest mate varied considerably. All 3 engines found 1...Qd4+ immediately and after the only move to prolong the game, 2.Qe3, even 2...Qxe3+ eliminates the mate threat and leaves Black with a winning material advantage. But, Stockfish found the apparently shortest mate fairly quickly (3 minutes), Komodo took a long time (1 hour), and Houdini was in between (19.5 minutes). And on a similar but slightly different position the time required by the 3 engines to find the shortest mate would probably have a different pattern.

Oct-26-16
Premium Chessgames Member
  diceman: Thanks Ayler.

<I don't think that this would be a good test for modern engines because most moves are forced even though the time they required to find the apparently shortest mate varied considerably.>

Not sure why you say that?

<Stockfish found the apparently shortest mate fairly quickly (3 minutes), Komodo took a long time (1 hour), and Houdini was in between (19.5 minutes).>

I would have thought "modern" computers
would be much closer.

That's exactly the type of "discrepancy"
I would hope it would uncover.

Maybe Stockfish is better suited for tactical positions?

Dual pawn promotions, king-side attacks.

...or Tal, Nezhmetdinov, games. :)

For the record, I found a flaw in
Ken Thomson's "Belle" and probably wont be satisfied until I "break" a modern computer. :)

Oct-28-16
Premium Chessgames Member
  AylerKupp: <diceman> Yeah, you’re right, I don’t know why I said that either. Thinking back I think that I reached that “conclusion” after running Stockfish and seeing that it didn’t have much of a problem finding the winning line. Then, while the Komodo analysis was running, I wrote the “conclusion” and left a few blank lines to enter Komodo’s result. When Komodo had difficulty finding the right line (and I still don’t know why it took it almost 47 minutes to go from d=20 to d=30), I decided to run Houdini as sort of a tie-breaker. When its performance fell somewhat in the middle writing down its results pushed the “conclusion” further down in the page and I didn’t modify it. Obviously a lesson about not making up one’s mind on the basis of early evidence only.

But I still don’t know if it would make a good test. Yes, it demonstrates the differences and difficulties that various engines have in finding a mate, particularly the shortest one, in this position. But is finding the shortest mate quickly really that good of a goodness criteria? After all, all 3 engines immediately found the only move, 1…Qxd4+ and also evaluated the position as a forced win for Black. Now, if one or two of these engines would have had difficulty in immediately evaluating a forced win for Black I would agree with you, but I’m not sure that having difficulty in finding the shortest mate in a reasonable amount of time is all that important. I suppose it all depends on what you’re trying to “test”.

But it certainly illustrates the differences in how some modern engines analyze a position. I don’t know if Stockfish is better in tactical situations (after all, how do you define “better”?) but it certainly prunes its search tree much more aggressively by default (you can change Komodo’s and Hiarcs search tree pruning aggressively) and as a result it reaches deeper depths more quickly than its competitors. And maybe the fact that this position had so many forced moves early on helped it since it the best moves were almost always in the search tree branches that it was investigating and did not have to do much backtracking. Komodo, with its less aggressive search tree pruning by default, looked at more dead end search tree branches that would not lead to anything and yet needlessly consumed time for unnecessary node evaluations.

Oct-28-16
Premium Chessgames Member
  AylerKupp: <diceman> With regard to “breaking” a modern computer, what would you consider “breaking”? There are many issues with modern computer engines, mostly due to the horizon effect, which I think makes their evaluations unreliable without forward sliding or close human review. The most egregious example was one analysis (which unfortunately I did not save) when an engine indicated that White has a slightly superior game even though Black had a mate in one on its next move. It simply had reached the end of its search tree and was completely blind as far as subsequent move.

A similar though less drastic example occurred in another position (which I did save but subsequently lost after a disk crash) where again the engine evaluated that White had a somewhat better game but it had again reached the end of its search tree when I stopped the analysis and, from the position at the end of its principal variation, Black had a forced mate in five.

A more subtle impact of the horizon effect occurs in long computer lines. The first few moves of the analysis have their branches in the search tree examined to a great depth and therefore you can have good confidence in its evaluations. But as the engine evaluates nodes closer to the end of its search depth, those evaluations of moves don’t have the benefit to having been examined to a great depth and so the confidence that you can have in its evaluations drops dramatically. Again, forward sliding and human checking are essential.

Oct-31-16
Premium Chessgames Member
  diceman: <AylerKupp: <diceman> With regard to “breaking” a modern computer, what would you consider “breaking”?>

In the case of Belle, it was an actual programming error in the code. (you just had to hit the position jackpot that brings it to the top)

I used to go to Bell Labs to play Belle,
and as a "chess player" tell Ken what I thought of it's play. (While the Deep Blue team had Joel Benjamin, Ken had the lesser prodigy from Brooklyn!)

Back then they still didn't know if
evaluation weights would be more
important vs. raw computing power.

I figured by today, most algorithms
would be efficient. Apparently that isn't the case.

Oct-31-16
Premium Chessgames Member
  diceman: <AylerKupp:
(after all, how do you define “better”?)>

Giving you the accurate answer in the shortest time. (at least when tactics are involved)

Nov-03-16
Premium Chessgames Member
  AylerKupp: <diceman> The topic of "programming error" is an interesting one. Sometimes it's obvious but sometimes it may not be. For example, if two strong engines have different evaluations of a sequence of moves and different principal variations, does it mean that one of them (or perhaps both of them!) have at least one programming error in their algorithms and heuristics or is it just a difference of "opinion" between the engines?

I do agree with your definition of "better" (provided there was agreement as to what the "accurate answer" was), but I don't know what to do about the qualifier "at least when tactics are involved". Would that imply that in a non-tactical position there might not be an "accurate answer"? I don't know.

Nov-03-16
Premium Chessgames Member
  AylerKupp: <diceman> BTW, I just downloaded and installed the latest Stockfish 8. I had it analyze the same position that you gave earlier:


click for larger view

But whereas it took Stockfish 7 3:09 minutes of calculation on my machine to find a mate in 18 for Black at d=33, it took Stockfish 8 only 1:02 minutes to find a slightly different mate in 18 at d=30. So I would say that in this case, Stockfish 8 is definitely better than Stockfish 7.

I can hardly wait for Stockfish 9!

Nov-08-16
Premium Chessgames Member
  AylerKupp: <Tiggler> I looked at Table 8.1b per https://www.fide.com/fide/handbook.... and recorded both the lowest and midpoint values of the "RtgDif" column. I then plugged in these values into both the Excel 2003 NORMDIST function [ NORMDIST(x,Mean,StdDev,TRUE) ] with the Mean = 0 and StdDev = 200 and the TRUE giving the CDF. I also looked at the CDF definition in both WikiPedia and Wolfram and calculated the CDF using both the lowest and midpoint values of the "Rtg Dif" column and the formula CDF = ½*[1+ERF(z)] where z = (x – Mean)/(StdDev*SQRT(2) ] and Excel 2003's ERF function.

And, in an attempt to ensure that Excel was not providing the wrong values, I looked up a published table of the Normal CDF (https://homes.cs.washington.edu/~jr...).

After rounding all values to two decimal places to correspond with FIDE's table, this is what I got with FIDE's table shortened to list only every 10th value (because I'm lazy):

RtgDif Low Mid H X(H(M)) E(H(M)) Table
0-3 0 2 0.50 0.50 0.50 0.50
69-76 69 73 0.60 0.64 0.64 0.64
146-153 146 150 0.70 0.77 0.77 0.77
236-245 236 241 0.80 0.89 0.89 0.89
358-374 358 366 0.90 0.97 0.97 0.97
> 735 736 768 1.00 1.00 1.00 1.00

where:

Low = Low value of RtgDif

Mid = Midpoint value (rounded) of RtgDif(Low) and RtgDif(High). For RtgDif > 735 I used the arbitrary upper bound = 800 to calculate the midpoint value, but this doesn't affect anything.

H = FIDE's CDF value for upper side of the CDF curve

X(H(M)) = CDF value calculated by Excel using NORMDIST and the midpoint of RtgDif

E(H(M)) = CDF value calculated by Excel using ERF and the midpoint of RtgDif

Table = Value of CDF from the published table after dividing each midpoint of the FIDE RtgDiff by 200 to reflect that the table used StdDev = 1.

As you can see, my CDF values calculated using Excel and the published table match, but they don't match the FIDE table.

I repeated the calculations using Excel 2010 and the supposedly improved functions NORM.DIST and ERF.PRECISE functions but got the same results as when using Excel 2003's NORMDIST and ERF functions. Probably not surprising given that I rounded everything off to 2 significant digits to match the accuracy of FIDE's Table 8.1b.

I checked and you are right, FIDE still uses the Normal Distribution curves rather than the Logistic Curves. Maybe I got confused because both the USCF and Glickman's system (Glicko) use the Logistic Distribution curves and in more than one place I've read that the Logistic Distribution Curves give a better fit to the actual data than the Normal Distribution curves. If that's indeed true, silly of me to think that FIDE would modify their system to make it more accurate. After all, what can you expect from an organization which institutes the Rule of 400 (originally the Rule of 350) to satisfy some of their GM members who complained of losing too many rating points if they lost to a much lower rated opponent. As though their losing to a much lower rated opponent would be the fault of the rating system!

Still, I sort of see FIDE's problem. If they changed their ratings calculation procedure then it would make it more difficult to compare a player's performance over time since some of their ratings would have been calculated using the Normal Distribution curves and some the Logistic Distribution curves. Still, if using the Logistic Distribution curves was indeed more accurate (although I doubt that the differences would be significant), then perhaps it could be gradually phased in by calculating the ratings of any players who already have a rating using the Normal Distribution curves and only using the more accurate curves for newly-rated players.

Nov-08-16
Premium Chessgames Member
  Tiggler: Try using StdDev = 2000/7 .

I did and then I checked every integer rating difference from 0 to 800. The difference from FIDE tables for these 801 values is 0.00 every time.

Nov-09-16
Premium Chessgames Member
  AylerKupp: <Tiggler> You're right, that makes the FIDE tables match the calculated Normal CDF values. And I also found your original post of 3-23-16. I guess we shouldn't blame Dr. Elo too much for using the approximation SQRT(2) = 0.7 since he was doing the calculations using pencil and paper and needed to simplify them as much as possible without introducing significant errors.

But what does this do to the accuracy of the FIDE ratings calculations? Or, more to the point, how close do the FIDE rating calculations using the Normal CDF match actual game results and would FIDE's rating calculations match game actual results better using a different distribution?

I started to try to do that for other reasons. I found a *.pgn games database, KingBase, which matched my needs perfectly and was free to download. I contains about 1.8 million games played since 1990 with no games less than 6 moves and no games where either player was rated below 2000. Unfortunately, even though its authors claim that it is updated monthly, there appears to have been no updates since Mar-2016.

I had written a parser for *.pgn files that I adapted to create *.csv files from the *.pgn files and loaded the *.csv files into Excel. Once in Excel I could filter out games played prior to 1998 (when FIDE stopped rounding ratings to the nearest 5 points) and by filtering out games that had the words "Simul", "Blind", "Blitz", "Rapid", and others I was able to compile a database of about 1.6 million OTB games played at classical time controls according to the ratings of the players in increments of 100 rating points. I could then determine the results distribution for White Wins, White Draws, and White Loses for each of the rating levels. My goal was to try to find the best-fit distribution(s) for each game result and rating level.

Alas, before I could finish it I lost all my data, including my *.pgn parser, due to the aforementioned disk crash and I don't know if I will ever have the time and motivation to regenerate it. Too bad.

Nov-09-16
Premium Chessgames Member
  Tiggler: <I guess we shouldn't blame Dr. Elo too much for using the approximation SQRT(2) = 0.7 since he was doing the calculations using pencil and paper and needed to simplify them as much as possible without introducing significant errors.>

So far as the value chosen for the s.d., accuracy doesn't really enter into it because the choice is arbitrary. Dr. Elo could have chosen sd = 1, and mean rating = 0 . Then we could just use the univariate standard normal distribution.

The chosen distribution and the game results generate the rating scale, not the other way round.

Nov-10-16
Premium Chessgames Member
  AylerKupp: <Tiggler> I only partly agree. True, the value that we use for the SD is largely arbitrary, but in his book "The Rating of Chessplayers, Past and Present" Dr. Elo claims to have chosen SD = 200 as the class interval by convention.

What matters, at least for my purposes, is the <predictive> accuracy of the rating system. After all, in the Elo system it's the rating differential that defines the expected result of a sufficiently large series of games between two players. A rating system that more accurately predicts this expected result of a sufficiently large number of players is a "better" (in the sense of being more accurate) than another system that is not as accurate. And, since the accuracy of a rating system is at least partly based on the probability distribution used, the selection of probability distribution and its CDF will have an influence in its predictive ability.

Nov-10-16
Premium Chessgames Member
  Tiggler: <AylerKupp> I have to tell you of a curious result that I discovered, and which <Gypsy> helped me to understand a few years ago.

Here it is:

If all players perform on average according to their current rating with random variation in accordance with the distribution used to generate their new ratings, then the population rating distribution must necessarily diverge. The population distribution will assume a gaussian shape and will keep a constant mean, but the standard deviation of the population ratings diverges and will NECESSARILY increase without limit.

A stable distribution of the population rating is only possible if the higher rated player in each game consistently underperforms by a small margin. Otherwise the top ratings must inflate.

This is easy to prove with a simple simulation.

The reason for it is also known, and I'll find the links that help to put together the explanation if you are interested.

Nov-11-16
Premium Chessgames Member
  AylerKupp: <Tiggler> Yes, I'm definitely interested. And, if you needed further convincing, that's what the data shows – sort of. I calculated and plotted the ratings difference between the 10th ranked player and the 5000th ranked player, between the 10th ranked player and the 4000th ranked player, between the 10th ranked player and the 3000th ranked player, etc. down to the difference between the 10th ranked player and the 50th ranked player from the end of 1966 to the end of 2015. From 1989 onwards, the rating differences between the ranked players are increasing, with the greatest increase between the 10th and 5000th ranked player and the smallest ratings difference increase between the 10th and 50th ranked player. I think that this is what you would expect if the standard deviation is increasing.

But prior to 1988 the reverse is true; the ratings differences are <decreasing>, which is what you would expect if the standard deviation is decreasing. And this doesn't make sense to me. It may be an artifact of the lowering of the ratings floor; there was no 5000th ranked player prior to 1986, no 4000th ranked player prior to 1982, no 3000th ranked player prior to 1980, and so on. And, since each player's initial rating is not as accurate as subsequent ratings, maybe these initial ratings were not accurate. But the same effect is noticeable, although to a lower degree, to as small a difference as between the 10th and 200th ranked player; a decrease in the rating difference between 1968 (the first time that ratings were available for the 100th and 200th ranked players) to 1989, then an increase in the rating difference. I don't know what to make of it.

Unfortunately the new data exceeded the column limits of Excel 2003 so in order to see it you must have a copy of Excel 2007 or later. I'll upload a summary version of the large spreadsheet so you can take a look at the data if you're interested. I'll see if I can figure out how to post the summary information in a *.pdf file so that those that don't have access to Excel 2007 or later can see the data.

Nov-11-16
Premium Chessgames Member
  Tiggler: Got the message that you are interested, and I do remember my offer to put together the explanation.

Just now I'm pondering how well to fulfill that commitment. It deserves a well crafted dissertation. A few hints with invitations to further question is the minimum default.

The longer you have to wait for my response, the more you are entitled to a big effort.

Nov-11-16
Premium Chessgames Member
  Tiggler: Here is the first essential link: https://en.wikipedia.org/wiki/Marko...

Elo approach assumes:

(1) that there is a Markov process at work: there is a current state and a set of transition probabilities to the next state that depend only on the current state.

(2) that the current state is fully described by a set of numbers that are the rating of each player.

(3) that the transition probabilities are dependent only on the differences between ratings of the two players in each game. The probabilities do not depend on the location parameter: https://en.wikipedia.org/wiki/Locat...

Let's pause while I decide what is the next step and you digest that.

Nov-12-16
Premium Chessgames Member
  AylerKupp: <Tiggler> Don’t trouble yourself, don’t feel bound by your offer, and, above all, don’t let it be a burden to yourself. I am interested but I am in no rush to find out the information. After all the current rating system will likely be with us for a long time.

So, yes, let’s definitely pause and you can get back to it whenever you have both the time and the inclination. In the meantime, I’ll look at the link you provided and refresh my knowledge of Markov chains.

Nov-12-16
Premium Chessgames Member
  Tiggler: I need to insert some discussion of items (1), (2) and (3) of the previous post. Are these assumptions, approximations, or what?

I chose to view them as axioms. The rating scale is generated from them. For example, concerning (3) - why should we believe this it is the case that the transition probabilities associated with a game between a 2200 and a 2300 player are the same as those associated with a game between a 2700 and a 2800 player? We don't have to believe it, because we have asserted it as an axiom. Thus we can say that the rating intervals between pairs of ratings is defined as the interval that corresponds to a given set of transition probabilities. The scale is generated from this assumption, just as the Celsius temperature scale was defined by the requirement that the change in resistance of a platinum thermometer between the ice point and boiling point of water corresponds to 100, and therefore 200 Celsius is defined as that temperature which results in this same change when compared to the boiling point of water.

One snag, however, is that we do not have two fixed points! So instead we chose some arbitrary value of the expected game score between players of a given rating difference, which in turn defines the state transition probabilities of our Markov process.

Pause for thought ...

Nov-12-16
Premium Chessgames Member
  Tiggler: At this point it is time to mention two crucial examples of the Markov process: Wiener process and Ornstein-Uhlenbeck process. They were explained to me in 2012 by <Gypsy> on this page: Hans Arild Runde

Wiki's explanations are here:
https://en.wikipedia.org/wiki/Wiene... and https://en.wikipedia.org/wiki/Ornst...

Nov-12-16
Premium Chessgames Member
  Tiggler: And next we need to know about Martingales: https://en.wikipedia.org/wiki/Marti...

The feature of Martingales that makes them relevant to our discussion is this:

Consider the universe of rated games of chess as contests for rating points. If I want to bet on the result, I might use the "expected scores" that are the ones "predicted" by the rating procedure. If the actual <expected value> https://en.wikipedia.org/wiki/Expec... is equal to the one used in the rating procedure, then the contest for rating points has the expectation that the gain/loss of points by each player is , on average, zero. The contest is a fair game, and therefore the process is a Martingale.

Nov-14-16
Premium Chessgames Member
  Tiggler: Almost there:

If the stochastic process defined by chess games, chess ratings, and FIDE rating regulations is a Martingale, then we can invoke the Martingale Central Limit Theorem:

https://en.wikipedia.org/wiki/Marti...

This says that as the number of steps (games, tournaments, whatever) increases the change in ratings from initial values tends to a Gaussian distribution with zero mean and variance that is proportional to the number of steps.

This proves my statement made half a page above, AylerKupp chessforum

"If all players perform on average according to their current rating with random variation in accordance with the distribution used to generate their new ratings, then the population rating distribution must necessarily diverge. The population distribution will assume a gaussian shape and will keep a constant mean, but the standard deviation of the population ratings diverges and will NECESSARILY increase without limit."

Nov-14-16
Premium Chessgames Member
  Tiggler: There are many interesting corollaries, concerning, for example, rating floors; 400-point rule etc.

Also, I have not proved that ratings actually perform this way, because the proof depends on the assumption that players actually perform according to their ratings. If not, then the process is not a Martingale.

Before discussing an alternative, the Ornstein-Uhlenbeck process https://en.wikipedia.org/wiki/Ornst..., I'd like some feedback.

Nov-26-16
Premium Chessgames Member
  AylerKupp: <Tiggler> Sorry, but I've been busy with several personal obligations and I haven't had much time to devote to chess. And what little time I've had has been devoted to following the Carlsen - Karjakin match.

One obvious comment, players don't necessarily perform according to their ratings. In every tournament there are players who perform better than expected and players who perform worse than expected. If that wasn'tthe case then there wouldn't be a point of having tournaments or matches, the winners would be known beforehand.

So any "proof" must be probabilistically based on the spread of player's performance, and I'm not sure if that is possible or meaningful.

Nov-26-16
Premium Chessgames Member
  Tiggler: <AylerKupp>

It is obvious of course that players cannot perform exactly according to their rating "expected score", except is a statistical sense. What I said before assumes this:

<If the actual <expected value> is equal to the one used in the rating procedure, then the contest for rating points has the expectation that the gain/loss of points by each player is, on average, zero. The contest is a fair game, and therefore the process is a Martingale.>

Jump to page #    (enter # from 1 to 53)
< Earlier Kibitzing  · PAGE 53 OF 53 ·  Later Kibitzing>

Advertise on Chessgames.com
NOTE: You need to pick a username and password to post a reply. Getting your account takes less than a minute, totally anonymous, and 100% free--plus, it entitles you to features otherwise unavailable. Pick your username now and join the chessgames community!
If you already have an account, you should login now.
Please observe our posting guidelines:
  1. No obscene, racist, sexist, or profane language.
  2. No spamming, advertising, or duplicating posts.
  3. No personal attacks against other members.
  4. Nothing in violation of United States law.
  5. No posting personal information of members.
Blow the Whistle See something that violates our rules? Blow the whistle and inform an administrator.


NOTE: Keep all discussion on the topic of this page. This forum is for this specific user and nothing else. If you want to discuss chess in general, or this site, you might try the Kibitzer's Café.
Messages posted by Chessgames members do not necessarily represent the views of Chessgames.com, its employees, or sponsors.
Participating Grandmasters are Not Allowed Here!

You are not logged in to chessgames.com.
If you need an account, register now;
it's quick, anonymous, and free!
If you already have an account, click here to sign-in.

View another user profile:
  


home | about | login | logout | F.A.Q. | your profile | preferences | Premium Membership | Kibitzer's Café | Biographer's Bistro | new kibitzing | chessforums | Tournament Index | Player Directory | World Chess Championships | Opening Explorer | Guess the Move | Game Collections | ChessBookie Game | Chessgames Challenge | Store | privacy notice | advertising | contact us
Copyright 2001-2016, Chessgames Services LLC
Web design & database development by 20/20 Technologies