< Earlier Kibitzing · PAGE 13 OF 13 ·
Later Kibitzing> 
Mar2416
  offramp: <Gregor Samsa Mendel: <offramp>Apparently we Yanks have mootated the meaning of mootness:
https://en.wikipedia.org/wiki/Mootn...
That is bizarre and disturbing. In means that British and American people reading the same text will have opposite views on what has been written. I will think it means "open to debate" and a yank will think it means "pointless to debate". Perhaps it is clearer if one uses a word such as "irrelevant", although that is less pretentious. 

Mar2416   zanzibar: I think we should table this dangerous discussion. 

Mar2516
  AylerKupp: <Tiggler> I wouldn't be too hard on Dr. Elo. After all, he was working at a time when there wasn't an easy access to computers and, whatever there was, was expensive. So it's natural for Elo to make many assumptions and simplifications in order to make the calculations easier. Still, using ¡Ì2= 0.7 instead of 0.707 seems excessive, as that would make the SD = 404 (exactly) instead of 400. And 404 should not be that much more difficult to use in the calculations as 400. For another view on the accuracy of the Elo tables, see http://recherche.enac.fr/~alliot/el.... Your comment about Elo's assumption that the standard deviation of the distribution is the same for all players gave me pause for some thought. Clearly that was a necessary simplification for Elo but it would seem possible today to calculate a performance distribution for each rated player (or at least the top ones), and use each player's distribution in calculating their tailored tdistribution (perhaps another good use of the letter "t"!). I don't consider this concept absurd at all. For example, based on the current Candidates Tournament, I would assume that the SD in Giri's performance (all draws) distribution would be much different than Nakamura's or Anand's (5 decisive games each). Of course, I don't now if it would make a significant difference in the results. The reason I've been considering all of this is that I'm trying to develop a predictor for game results in the Candidates Tournament for User: golden executive contest. I had been doing reasonable well (my goal was a 75% correct prediction) until the last round (70%), when I used my "hunch" instead of some of the model's predictions and I was wrong while the model was right. My enthusiasm is greatly tempered by the realization that if I had simply predicted that each game would end in a draw I would have been correct 71.1% of the time, even with the Nakamura ¨C Anand result included. 

Mar2516
  offramp: It's all moot, isn't it? 

Mar2616   Tiggler: <AylerKupp> Sorry to be pedantic (though you would not be the one who would complain of this), but you cannot have tailored tdistributions for each player. The tdistribution is for the difference of two samples from the same normal distribution. <offramp> Yes indeed, quite moot: worthy of debate. 

Mar2716   luftforlife: In American usage, the adjective "moot" enjoys three denotations: first, "open to question; subject to discussion; debatable; unsettled"; also, "subjected to discussion; controversial, disputed"; second, "deprived of practical significance; made abstract or purely academic"; third, "concerned with a hypothetical situation." Webster's Third New International Dictionary (Springfield, Mass.: MerriamWebster Inc. 1993), 1468. The second denotation does not connote, and is neither equivalent with, nor tantamount to, irrelevance per se (for such a moot point retains its academic relevance, its fitness for abstract consideration, or both), but rather connotes a change in status that can, in the legal context at least, lead to a change in treatment  to unfitness for further consideration, thwarting and thereby pretermitting practical, concrete, specific, and final resolution, disposition, or decision, of a case turning on, and fatally infected by, such a moot point  due to limitations of power. 

Mar2716
  perfidious: <luftforlife> Used as an adjective, you are correct; however, that is not the full story. While as a noun, the word is comparatively uncommonly used, as a verb that is not the case, though of course Over Here 'debate' is much more often employed. http://www.merriamwebster.com/dict... 

Mar2716   luftforlife: <perfidious>: Thanks for your comment, and I take your point. I focussed on the American adjectival form and usage chiefly to point up (and to contrast with irrelevance per se) the American denotation "deprived of practical significance"  a necessary and sensible accretion to meaning as it has arisen and as it has been applied as a term of art by our Supreme Court in its construction of our Constitution and its limitations on the federal judicial power, but one that has, on our shores, overspilled the narrow confines of that usage, and that has, in more general American usage, come to acquire connotations that dull, obscure, and even subvert not only the other American adjectival denotations, but also the essential and vital British origins, meanings, and past and present uses of the word in all its forms. I appreciate <offramp's> incisive comments and reminders in this regard. Your comment and the others above my own are illuminating and edifying. Kind regards. 

Aug2516   TheFocus: Happy birthday, Arpad Elo. 

Sep3016
  alexmagnus: The average rating of women's top 100 is now the lowest since it is published as top 100 (and not as top 50). 11 points below alltime high from April 2015. (the even lower number from July 2013 on the FIDE site is wrong  in that month, FIDE accidentally published top 120 instead of top 100). Open top 100 on the other hand is extremely stable in recent years. 

May0717
  AylerKupp: Calculating P(Win), P(Draw), and P(Loss) – Articles found (part 1 of 2) The FIDE scoring tables (https://www.fide.com/fide/handbook...., Table 8.1b) indicate the probability of a win [ P(W/D) ] OR a loss [ P(L/D) ] for a player based on the rating difference between that player and his opponent. In certain situations it is desired to determine the probability of a win, a draw, or a loss [ P(W), P(D), P(L) ] for a player, again based on the rating difference between that player and his opponent. This is surprisingly not easy to do, and additional information is needed. In his book, "The Rating of Chess players – Past & Present", he says "All data entering the rating system consist of total points scored in actually played game ... Discrimination as to how any point score is composed between wins, draws, and losses is beside the point." I think that the real reason that he ignored the effect of draws is that he developed his system when there weren't any cheap computers easily available to do calculations, and no online game databases containing the necessary information. And so he remarked that "Any consideration of draws in rating theory requires information on the probabilities of draws, as well as wins and losses, between individual players, information which is not readily available. Its accumulation would be inordinately laborious, and there has been little demand for it." Well, maybe not then. Over a period of time I've been able to come up with only 3 articles describing methods for attempting to extract P(D) from P(W/D) and P(L/D), and in one of them the author threw up his hands when he realized that he needed additional information. These articles are: 1. "Individual Chess Game Probabilities based on Match Results". Written by Charles Roberson in 2012, the link no longer works. His method relates the P(W), P(D), and P(L) to the probabilities of a player winning or losing a match. I don't find it very convincing because he makes statements like "E(D) = Expected game draw percentage = Match play probability of losing" without substantiation, and when you plot P(W), P(D), and P(L) on the same chart you get a very sharp slope change in P(W) and P(L) that just doesn't look right. 2. "Bayesian Elo Rating" by Remi Coulom, written in 2004. (https://www.remicoulom.fr/Bayesian...). His method calculates P(W), P(D), and P(L) using Bayes' Theorem by choosing a prior likelihood distribution over Elo ratings and computing a posterior distribution as a function of the observed results. Whatever that means. Seriously, as Dr. Elo said, calculating P(W), P(D), P(L) from P(W/D), P(L/D) requires additional information, and the author estimates a Draw Likelihood by simulation. I think that this number represents the Expected Drawing Percentage [ E(D%) ] x standard deviation (SD) but I'm not sure. With E(D%) known then E(W%) and E(L%) can be calculated and from them P(W) and P(L). With P(W) and P(L) known then P(D) = 1 – P(W) – P(L) since all results are mutually exclusive. When you plot P(W), P(D), and P(L) you get nice smooth curves which is what you hope for. I think. As a bonus, the article addresses and quantifies White's opening advantage, something that neither Dr. Elo nor FIDE address, although Dr. Elo mentioned it in his book, dismissing it with the comment "Any incorporation of colors into the rating system, however, would again inordinately expand the bookkeeping requirements with small prospect of any utility for it, in the final analysis." IMO, wrong again, Dr. Elo, even though it's understandable given the lack of accessible computers when he developed his system. 

May0717
  AylerKupp: Calculating P(Win), P(Draw), and P(Loss) – Articles found (part 2 of 2) 3. "How to calculate probabilities of Win, draw and loss based on the ELO system" written in 2014 (https://math.stackexchange.com/ques...) with no user name given. The author attempts to calculate (PD) by looking at the expected score (EA, EB) in a game between 2 players (A and B) and, since FIDE considers a draw to be 1/2 White win and 1/2 Black win, the formulas: EA = P(A wins) + 1/2*P(Draw) + 0*P(A loses) = P(A wins) + 1/2*P(Draw)
EB = P(B wins) + 1/2*P(Draw) + 0*P(B loses) = P(B wins) + 1/2*P(Draw) But then he realized that he needed additional information (which he would have known had he read Dr. Elo's book) and gave up, asking for suggestions. Which he didn't get. Still, I used his method to calculate P(W) and P(L) using the P(D) calculated in articles 1 and 2 above. But no new information, the chart using P(D) from Article 1 looks just like the chart in Article 1 and the chart using the P(D) calculated using the P(D) from Article 2 looks just like the chart in Article 2. I've created a spreadsheet describing the above in more detail as well as additional information such as: 1. The Percentage Expectancy Table (which is the same as FIDE's table 1b called the Scoring Probability) listed in Dr. Elo's book is wrong if a SD = 200 is used as Dr. Elo indicates he used. However, as user <Tiggler> pointed out, if a SD = 2000/7 is used instead, then the numbers match perfectly. I suppose another simplification made by Dr. Elo. 2. The FIDE Scoring Probability table (as well as Dr. Elo's Percentage Expectancy Table) only has 2 significant digits. As a result, each P(W/D) covers a range of rating differentials (RDiffs). It's easier to deal with probabilities if each rating differential has a unique probability associated with it, and this is listed in one of the spreadsheet tabs. Five significant digits are needed in order to uniquely associate each RDiff with a probability. 3. The probabilities calculated using the Match Results method are listed and plotted. 4. The probabilities calculated using the Bayes method are listed and plotted. This one is particularly interesting because you can see the effect of incorporating White's first move advantage into the probabilities. It also shows how to incorporate different White Advantage and Draw Likelihoods using data derived from the Opening Explorer database, the ChessTempo database, or any other database that provided a percentage of White wins, draws, and losses. You can download this spreadsheet from http://www.mediafire.com/file/m2skk.... The file is about 2.4 GB. You will need Excel 2003 or later to view it. 

May0917
  AylerKupp: Calculating P(Win), P(Draw), and P(Loss) – The Area method I was not satisfied with the results obtained by attempting to calculate P(W), P(D), and P(L) based on the articles I found. The Bayesian method seemed the most promising since it yielded the expected, or at least hopedfor, smooth curves. But, since neither the games database used nor the simulation was made available, the probabilities could not be modified to reflect the different P(W)s, P(D)s, and P(L)s at different player levels (both players rated 2200+, both players rated 2300+, etc.), since the EloDraw parameter was not known. And it was also not clear to me how the factor to incorporate White's opening advantage (EloAdvantage) was calculated. Besides, the resulting P(D) simply seemed too low, particularly at the higher player rating levels. Then I had an epiphany. P(D) is the area under the P(D) curve, and the Draw percentage is based on the ratio of this area to the total area, i.e. A[ P(D) ]% = A[ P(D) ]% / ( A[ P(W) ]% + A[ P(D) ]% + A[ P(L) ]% ) So I could iterate and find the value of EloDraw that resulted in A[ P(D) ]% being equal to the observed in the games database filtered to include only the player rating levels desired. And A[ P(D) ] was easy to calculate since we are effectively dealing with the discrete probability distribution of a random variable (i.e. the results of games), it was just the sum of all the P(D)s x 1601 (the spread of the distribution, + 800 + 1 in this case), since the width of each value of the sample is = 1. And the spread is not actually needed since we are calculating ratios, so the spread cancels out. Then, once the value of EloDraw is known, P(W) and P(L) can be calculated. I've updated the spreadsheet to add the description of the Area method and a tab to calculate and plot P(W), P(D), and P(L) for the set of ChessTempo win, draw, and loss percentages corresponding to both players rated 2200+ and 2600+. You can download this updated spreadsheet from here: http://www.mediafire.com/file/syrgd.... To make the distinction clearer, I changed the names of the parameters EloAdvantage and EloDraw to WhiteAdvantage and DrawLikelihood respectively, since they no longer have anything to do with Elo distributions, including FIDE's P(W/D) and P(L/D). Using this method you can calculate the P(W)s, P(D)s, and P(L)s using the White win, draw, and loss (Black win) percentages from any games database and using any probability distribution that you think is the most accurate. 

May1117
  AylerKupp: OK, FWIW, I downloaded the ChessTempo data for games where both players were rated 2700+ (a very time consuming procedure, effectively 29 screen captures) and I got some interesting results: Total number of games = 14,502 (an increase of about 180 games since 20517) White wins 4,167 games (28.7%)
Draws = 7,603 games (52.4%)
White loses 2,732 games (18.8%)
White's advantage = 9.9%
I filtered the data according to information in the Event column and I discarded games earlier than 1990 to be consistent with the KingBase data. This are the numbers for different types of games: Classic 8,666 games (59.8%)
Blitz 3,205 games (22.1%)
Rapid 1,776 games (12.2%)
Blindfold 639 games (4.4%)
Exhibition 2 games (<0.1%)
Simultaneous 1 game (<0.1%)
Too Old 213 games (1.5%)
For Classic time control games only, here are the statistics: White wins 2,141 games (24.7%)
Draws = 5,322 games (61.4%) (!)
White loses 1,203 games (13.9%)
White's advantage = 10.8%
So the incidence of draws for Classic time control games when both players are rated 2700+ is greater than when all games are considered. Which makes sense; I would think (I didn't calculate it) that the likelihood of errors is higher at faster time controls, never mind blindfold games. As a check, here are the statistics for the recently completed Gashimov Memorial: Total number of games = 45
White won 9 games (20.0%)
Draws = 29 games (64.4%)
White loses 7 games (15.6%)
White's advantage = 4.4%
Not too inconsistent, keeping in mind that this is a very small number of games so a substantial deviation from the means is expected. I doubt that I'll repeat it with the data for players rated 2600+ since there are 67,506 of those games and that would require 135 screen captures! I think I'll wait until I set up the KingBase data. 

Mar0719
  Sally Simpson: ***
‘I met the eponymous professor [Arpad E. Elo] during the chess olympics at Nice in 1974. He was besieged with requests by players wanting the rules bent to accommodate their own requests for international titles. When the last of the supplicants had gone, Professor Elo said to me: “I think I have created a monster.” I think so too.’ (Bill Hartson NOW! magazine 17 August 1980, page 82.) C.N. 6742 *** 

May1320
  MissScarlett: La Crosse Tribune, August 17th 1928, p.6:
<Moon Establishes His Innocence In Trial For Assault Milwaukee, Wis. — (AP) — The moon and its phases Friday freed Paul Saunders from conviction on a charge carrying a maximum sentence of 30 years in the state prison. Through the testimony of W.P. Stewart, federal meteorologist and Arpad E. Elo, astronomist at Marquette university, Saunders proved to the satisfaction of Municipal Judge George Shaughnessy that the darkness of the night made identification of the assailant of Mrs. Jessie Forbes impossible. Mrs. Forbes complained that she was attacked by a pajama clad prowler in the bedroom of her home and identified Saunders, declaring she caught a glimpse of his face by moonlight. Saunders called upon Stewart and Elo to establish that on the night in question, the moon was in its last quarter and was so low in the sky that combined with the trees near the Forbes home shut off any light. Saunders' wife testified he was home that night, and Judge Shaughnessy then ordered a verdict of “not guilty.”> Frankly, a ridiculous defence. I'd have blamed a onearmed man and legged it. 

Nov1721   Whitehat1963: One of my quibbles with Elo ratings is that it doesn’t take circumstances into consideration. For example, if two players are mathematically eliminated from placing high in an important tournament, and are playing in the last or next to last round, they will most likely draw, regardless of their respective ratings. Also, if a higherrated player is playing black, he is far more likely to accept a draw offer than if he is playing white, especially in the late rounds of a tournament. I don’t know that a mathematical formula can account for such circumstances, but failing to account for such circumstances makes Elo ratings far from perfect. 

Nov1721   Whitehat1963: Another problem with Elo ratings is that someone like, say, Alireza Firouzja can increasing his rating by playing aggressively against a slew of lowerrated players but playing solely for draws against higherrated players. There are no guarantees of defeating ANY player, of course, but playing very well against a bunch of lowerrated players is far easier than beating, or even drawing, against players rated in the top five or 10 in the world. 

Nov1821
  keypusher: <whitehat1963> <Another problem with Elo ratings is that someone like, say, Alireza Firouzja can increasing his rating by playing aggressively against a slew of lowerrated players but playing solely for draws against higherrated players. > Firouzja doesn't do that, but anyway: if you can beat people rated 2600 and draw with people rated 2750, your rating ought to be 2750. If your rating is anything other than (roughly) 2750, the rating system is screwed up. There are only two ways to get a high Elo rating: be really strong, or hide a copy of Stockfish in your shoe. <I don’t know that a mathematical formula can account for such circumstances, but failing to account for such circumstances makes Elo ratings far from perfect.> Luckily there are precisely zero people in the history of the world who think Elo ratings are perfect. 

Nov1821
  perfidious: Dr Elo had it about right at Nice 1974:
'I think I have created a monster '. 

Jan2722   Whitehat1963: What would happen to players’ Elo ratings if draws were not part of the equation? I am thinking about Carlsen at the Tata Steel tournament right now. It is an elite tournament. He has four wins and six draws, no losses. His Elo rating on the live list has increased by one point. In terms of his rating, it is hardly worth the risk to play at all. 

Aug2522   Captain Hindsight: Elo is old news at Tinder. 

Dec0823   Caissanist: FIDE set to make major changes to the ratings algorithms on January 1. A lot of the changes appear to be pushed by Jeff Sonas, the statistician behind Chessmetrics: https://www.chess.com/news/view/fid... . 

Dec0823
  0ZeR0: <Caissanist>
Very informative article, thanks for sharing. I'm no statistician but it will be interesting to see how the proposed changes will effect the rating list. 

Dec0823   sudoplatov: I developed a simple rating system for the NFL. It does assume an essentially level playing field; it also helps if the system is closed. It's surprisingly hard to program (too much input data) and I haven't extended it. Method. At the beginning of a season (the lack of continuity of team membership makes it less useful over several years), each team has a rating of zero. After each round (Thursday to Monday in the NFL) a team gets 2 points for a win and 2 for a loss, 0 for a tie. To account for strengthofschedule, each team is given a bonus of 1 for each win the teams the given team beat and 1 for each loss from each team it loses to. Each game played may affect every other team indirectly. I got about 70% from midseason on. Nate Silver (I think) has an article on randomness in the NFL indicating that about 50% of a team's result is "random." I have intended to do a study on the aging of results but just never took the time to do so. My idea was to date each game and have a decay of some amount (like 15/16 for the NFL) for each week. Thus games played several weeks earlier wouldn't count as much; the direct scores and strength terms could have different aging. I think may work for chess tournaments. Maybe I can try this for the big tournaments from 1895 to 1905 or so. The results of similarly structured tournaments should carry over pretty well: Hastings 1895, St Petes 1896, Nuremberg 1896, etc. The method is designed to measure relative strength; I haven't looked at generating quantitative win formula, but I think it shouldn't be too hard. 



< Earlier Kibitzing · PAGE 13 OF 13 ·
Later Kibitzing> 


