Members · Prefs · Collections · Openings · Endgames · Sacrifices · History · Search Kibitzing · Kibitzer's Café · Chessforums · Tournament Index · Players · Kibitzing

Arpad Elo
A Elo 
Number of games in database: 22
Years covered: 1935 to 1957
Overall record: +4 -15 =3 (25.0%)*
   * Overall winning percentage = (wins+draws/2) / total games.

Repertoire Explorer
Most played openings
C27 Vienna Game (2 games)

Search Sacrifice Explorer for Arpad Elo
Search Google for Arpad Elo

(born Aug-25-1903, died Nov-05-1992, 89 years old) Hungary (federation/nationality United States of America)
[what is this?]
Arpad Emrick Elo, born in Hungary in 1903, emigrated to the US and became a professor of physics at Marquette University in Milwaukee, Wisconsin. Many times Wisconsin State Chess Champion he is best known for developing a mathematical rating system for players that was universally adopted in 1970.

Wikipedia article: Arpad Elo

 page 1 of 1; 22 games  PGN Download 
Game  ResultMoves YearEvent/LocaleOpening
1. A Elo vs G Eastman  1-027193536th ACF Congress. Prelim CC27 Vienna Game
2. A Elo vs T Barron  1-021193536th ACF Congress. Prelim CB00 Uncommon King's Pawn Opening
3. A Elo vs A Simonson  0-1371935ACF CongressC11 French
4. A Elo vs Fine ½-½581935Western ChampionshipC01 French, Exchange
5. Dake vs A Elo  1-0531935Western ChampionshipA09 Reti Opening
6. Kashdan vs A Elo 1-0241935USA-36.Congress MastersD49 Queen's Gambit Declined Semi-Slav, Meran
7. W A Ruth vs A Elo  1-0301935ACF CongressA45 Queen's Pawn Game
8. A Elo vs Santasiere  0-1521935ACF CongressB29 Sicilian, Nimzovich-Rubinstein
9. A Elo vs Dake  ½-½311936Milwaukee City ChampionshipB84 Sicilian, Scheveningen
10. Santasiere vs A Elo  1-0391937ACF CongressA21 English
11. A Elo vs P Litwinsky  1-031193738th ACF Congress. Preliminary 4B54 Sicilian
12. A Elo vs E W Marchand 0-1331937ACF CongressB73 Sicilian, Dragon, Classical
13. A Elo vs Albert Roddy  0-1411940US OpenB58 Sicilian
14. A Elo vs Fine  ½-½231940US Open (Prelim 2)C27 Vienna Game
15. A Elo vs Fine 0-1351940US OpenB20 Sicilian
16. J Thompson vs A Elo  1-0381940US OpenC86 Ruy Lopez, Worrall Attack
17. A Elo vs H Burdge 1-0231940US OpenD51 Queen's Gambit Declined
18. A Elo vs H Steiner  0-133194041st US Open. FinalA04 Reti Opening
19. W Shipman vs A Elo  1-027194647th US OpenC39 King's Gambit Accepted
20. O Ulvestad vs A Elo 1-019194647th US OpenE10 Queen's Pawn Game
21. I A Horowitz vs A Elo  1-041195354th US OpenD32 Queen's Gambit Declined, Tarrasch
22. A Elo vs Fischer 0-1491957Milwaukee N-WesternB93 Sicilian, Najdorf, 6.f4
 page 1 of 1; 22 games  PGN Download 
  REFINE SEARCH:   White wins (1-0) | Black wins (0-1) | Draws (1/2-1/2) | Elo wins | Elo loses  

Kibitzer's Corner
< Earlier Kibitzing  · PAGE 12 OF 12 ·  Later Kibitzing>
Premium Chessgames Member
  offramp: I don't think there is much inflation.

Players get better, just like sprinters and marathon runners and swimmers get faster.

Other sports, such as cricket, also use a system based on the Elo system. Does anyone know if these have inflation?

Premium Chessgames Member
  AylerKupp: <frogbert> Glad to do it. Here are the links to your posts containing the word "systemic" and addressing the definition of systemic inflation, most recent ones first.

Tata Steel (2016) (kibitz #1053)

Shams chessforum (kibitz #788)

Hans Arild Runde (kibitz #6134)

Magnus Carlsen (kibitz #70671)

Hans Arild Runde (kibitz #4317)

Hans Arild Runde (kibitz #3423)

There is a fairly small number of posts that I found where you actually define what you mean by "systemic inflation". This might be the result of my too rigid filtering of the posts; most of them address the <results> of you analysis and not the methodology, provide examples, or (of course) some back and forth bantering with other posters.

If you don't find what you are looking for in these posts, I'll gladly post the entire set of links and let you look for the one(s) that you think are the most applicable. And please don't hesitate to ask me to do that; I'm retired and have time on my hands. Besides, there are only about 7 pages of posts so it's not a lot of work / time involved.

Feb-19-16  frogbert: There's a longer sequence of posts about rating inflation in my player page, starting with Hans Arild Runde (kibitz #5808) where several relevant points and observations are expressed. The linked post by <cro777> quoted a 2013 interview with Kasparov which sparked some rounds of debate:


"Fischer's rating was 2785 in 1972, but it certainly has much more weight than Carlsen’s higher rating. And this is also comparative to my 2851 rating in 1999. An evolutionary factor is in play here. That’s why, despite the mathematical soundness of ratings, I wouldn’t give them such a historical significance.

Fisher regularly achieved “+6” when he was moving up, I often attained “+6-7”, while Carlsen gets “+3-4”. And that’s enough, because the pyramid has grown and today’s super tournaments already have ratings beyond 2750."


I didn't treat Kasparov's take on the workings of the rating system very humbly in my follow-up post. Except that I totally agree with hi that FIDE ratings shouldn't be used to compare players from different eras, of course. It's a couple of Kasparov's other implied assumptions/claims I disagree with.

Feb-19-16  frogbert: <AylerKupp> Thanks, you found the one I was thinking of - the most recent in your list of hits.


[There has been] systemic inflation of 1-1.5 points a year the past 25 years, but it's a moot point anyway since ratings don't measure skills [directly] - only relative success against your peers.

Systemic inflation is "my term" for total gain TG minus total loss TL of rating points averaged over the number of players NP who had games rated between two rating lists. Hence, to calculate systemic inflation since say 1990, for each rating list Rn (n = the first list you consider) do this:

Look at list Rn and Rn+1 and calculate
An = (TG - TL)/NP between Rn and Rn+1

Then do the same, calculating An+1, for lists Rn+1 and Rn+2, and so on, for all lists up to the current list. Then calculate the total average of all An ... An+N for the N consecutive rating periods you considered.

I refer to this total average as the systemic inflation [per rating period] over these N rating periods.


However, I'm not totally convinced that this definition is entirely sound in all detail. For instance:

1) Is it meaningful to average per player when players' activity varies so much?

2) There's a slight difference between doing calculations of influx/outflux (TG minus TL) month by month (between adjacent lists) and comparing say the february list of 2015 with the february list of 2016 directly. Does it matter?

And so on. Still, since I for the most part have details about age, activity (#games played) and (more recently) each player's K (although it can be deduced for older data too - which I might do and add to my own database), it's possible to show a lot of correlations in this data set, related to systemic inflation, rating gain/loss and so on.

Correlation and cause are two different beasts, but I still think insights can be gleaned here.

Feb-22-16  pinoy king: All one has to do is look at the horrendous quality Carlsen's games today to see that he does not deserve to be rated over 100 points more than previous world champions.
Premium Chessgames Member

This paper offers a brief demonstration of the relationship between the quality of play and ratings of both human and computer rating systems using Komodo 8. In a few months I'll upload a longer overview where I'll look at further differences between the way engines and humans play chess. Instead of more usual centipawns, upon urging by users Kai Laskos and Larry Kaufman from, I've transformed Komodo 8 evaluations into expected scores using the logistic function.

a=1.1 normalization factor

ExpectedScore = 1 + (Exp[p/a] - Exp[-p/a])/(Exp[p/a] + Exp[-p/a])/2 >

This table compares the two methods:

cp exp score
0.00 50.00%
0.50 69.71%
1.00 84.11%
1.50 92.41%
2.00 96.56%
2.50 98.47%
3.00 99.33%
3.50 99.71%
4.00 99.87%
4.50 99.94%
5.00 99.98%

As you can see, using expected scores has two advantages over centipawns: 1) it eliminates the need for artifical thresholds or cut-offs. There's virtually no difference whether the evaluation swings from 0.12 to 2.96, or to 9.05; it is a lost position in either case. But it may have an unwanted and distorting effect on results in relatively small datasets. 2) high evaluation, like the difficulty of positions affects the accuracy of play. There are two different ways: a) the scaling effect - higher evaluations are accompanied by larger eval gaps between move choices; b) human players' tendency to make desperate moves when behind in eval, and to seek easier and riskless paths to mate the opponent when ahead in eval. Using expected scores eliminates the former - the scaling effect since it is independent of evaluation numbers.

This is not the first time I've attempted to compare engines and humans. Previously I've been subject to criticism by users who have played against low-rated engines and concluded that these are actually weaker than indicated by papers. However, the graphs are only intended to demonstrate the relative playing strength of humans and engines under assumption that humans play against engines without employing anti-computer strategy. It is impossible to take into account hypothetical increase in strength due to anti-computer strategy, as we do not know yet what factors ultimately determine its efficiency.

It should be stressed that the 'error' on both graphs on the Y axis represents the average expected error, i. e. the estimated hypothetical accuracy of play in case all entities involved had the same difficulty of positions. If the average difficulty of moves by an entity is lower than on average, then

What can we conclude from the results? Some people will certainly be surprised, because doesn't it seem most logical to assume that engines and humans share the same accuracy-strength relationship? To me it indeed seemed so and when I first undertook such comparison and saw the final results couple of years ago, it greatly surprised me. It turns out that while humans experience diminishing rating gains per equivalent accuracy increase, engines are the other way round: adding the accuracy of play leads to increasing rating gains (!). Not that it is increasingly easier to make progress in computer chess, of course, it is a completely another matter. In retrospect, that should not entirely come as a complete surprise, because we know that engines and humans play chess very differently. These differences are following: 1) engines do not play pragmatically, they always strive for objectivity; but humans do. 2) engines have a broader search-tree than humans. They include in calculations a wide array of move choices. Even absurd-looking ones get calculated a few plies deep. Humans, on the contrary first look at the position to find potentially good moves, pick few candidates and then start calculating. 3) the most obvious difference lies in the fact that engines, unlike humans, almost purely rely on calculations, the relative importance of evaluation function is ever-diminishing by advances in hardware and search function. It implies that changes in the level of the difficulty of positions affects humans more, but engines less. I think we can dismiss the first one for now, as it is more about deliberate and concscious choices than the fundamental nature of move selecting processes.

Unfortunately, they still don't explain the causes why the relationships are just like that and not reversed, i. e. humans increasing gains and engines diminishing gains. And what about other skill-based board games? Will go, checkers, arimaa etc discplay the exactly same phenomenon? These are intriguing questions and I think it's worth to make future research on it.

Premium Chessgames Member
  nimh: Here are tables making it easy to convert CCRL into FIDE and vice versa.

3100 3375 3600 3130
3000 2915 3500 3118
2900 2646 3400 3104
2800 2461 3300 3088
2700 2324 3200 3069
2600 2216 3100 3048
2500 2127 3000 3024
2400 2053 2900 2996
2300 1989 2800 2963
2200 1934 2700 2924
2100 1885 2600 2878
2000 1841 2500 2824
1900 1802 2400 2759
1800 1767 2300 2680
1700 1734 2200 2584
1600 1704 2100 2466
1500 1677 2000 2318
1400 1651 1900 2132
1300 1628 1800 1894
1200 1605 1700 1584
1100 1585 1600 1175
1000 1565 1500 624

The hardware that is used in creating CCRL lists is outdated by todays's standards, and time controls are ca 3x shorter. How well would Stockfish 7 (3341 CCRL) perform in terms of FIDE 2014, if we had the best hardware possible and standard time controls? A direct comparison of data shows that 3341 CCRL corresponds to 3095 FIDE. According to the PassMark website, Athlon 64 X2 4600+ (2.4 GHz) has an Average CPU Mark of 1365, whereas the strongest one - Intel Xeon E5-2698 v3 @ 2.30GHz - has 22309. They altogether amount to ca 49x advantage in the search quantity. LOG2 of 49 is 5.6 doublings. At that level each doubling is actually worth less than a conventionally used estimate of 50 ELO; user Kai Laskos from has done a reseach into this and found that at TCEC level (faster than the example given previously) the gain per doubling is below 40 ELO. Hence, using 40 ELO per doubling, the end result turns out to be 5.6 x 40 + 3341 = 3565, which equals to 3126 FIDE.

So the final conclusion I draw is that given humans do not use anti-computer strategy, Stockfish 7 on top hardware would perform 3100-3150 against humans.

Premium Chessgames Member
  Tiggler: Reposted from <Tiggler> chessforum:

I think I found the origin of the mysterious differences between the FIDE tables for ratings based expected scores and the cumulative normal distribution with sd = 400.

The wiki article on the ELO system states:

" FIDE continues to use the rating difference table as proposed by Elo. The table is calculated with expectation 0, and standard deviation 2000 / 7."

If so, then it appears that Elo used the approximation 1/sqrt(2) = 0.7 .

For a difference in scores the corresponding distribution has sd multiplied by sqrt(2), so instead of getting sd = 400, as I had previously assumed, we get 404.061 .

So now the expected score (per game) is given by

=ERF(ratings difference/404.061)*0.5+0.5

This formula does match the tables in section 8.1 of the FIDE handbook.

Premium Chessgames Member
  Tiggler: Reposted from <Tiggler> chessforum:

In an interesting post on the WC Candidates forum, <AylerKupp> mentioned that Arpad Elo suggested the use of a t-distribution: World Championship Candidates (2016)

The t-distribution (Student's t) is used to find the distribution of the differences between pairs of values drawn INDEPENDENTLY from the same normal distribution (my emphasis).

Elo's underlying assumption is that the performance of a player in a single game is distributed normally about his expected value, and that the standard deviation of the distribution is the same for all players.

So when two players come to the board the difference in their performance is based on their two independent random samples from their individual distributions. Hence the t-distribution.

This seems to me to be extremely contrived, though of course Dr. Elo can make whatever ad hoc assumptions he choses in his system.

I prefer the following argument, however. When two players come to the board, the distribution of the differences in their performance is the fundamental one, and the most parsimonious (in the Occam sense) description of this is the normal distribution.

We cannot say that in a single game the deviation of player A's performance from his expectation is independent of the deviation of player B's performance from his expectation. On the face of it that is absurd.

Premium Chessgames Member
  offramp: A moot is a debate, especially an ecclesiastical one. A moot point is any subject up for discussion at a moot.

Moot does not mean irrelevant. It means the opposite of irrelevant.

Since the difference in meaning is so large it is a good idea to get it right.

Premium Chessgames Member
  Gregor Samsa Mendel: <offramp>--Apparently we Yanks have mootated the meaning of mootness:

Premium Chessgames Member
  offramp: <Gregor Samsa Mendel: <offramp>--Apparently we Yanks have mootated the meaning of mootness:

That is bizarre and disturbing. In means that British and American people reading the same text will have opposite views on what has been written. I will think it means "open to debate" and a yank will think it means "pointless to debate".

Perhaps it is clearer if one uses a word such as "irrelevant", although that is less pretentious.

Premium Chessgames Member
  zanzibar: I think we should table this dangerous discussion.
Premium Chessgames Member
  AylerKupp: <Tiggler> I wouldn't be too hard on Dr. Elo. After all, he was working at a time when there wasn't an easy access to computers and, whatever there was, was expensive. So it's natural for Elo to make many assumptions and simplifications in order to make the calculations easier. Still, using ¡Ì2= 0.7 instead of 0.707 seems excessive, as that would make the SD = 404 (exactly) instead of 400. And 404 should not be that much more difficult to use in the calculations as 400.

For another view on the accuracy of the Elo tables, see

Your comment about Elo's assumption that the standard deviation of the distribution is the same for all players gave me pause for some thought. Clearly that was a necessary simplification for Elo but it would seem possible today to calculate a performance distribution for each rated player (or at least the top ones), and use each player's distribution in calculating their tailored t-distribution (perhaps another good use of the letter "t"!). I don't consider this concept absurd at all. For example, based on the current Candidates Tournament, I would assume that the SD in Giri's performance (all draws) distribution would be much different than Nakamura's or Anand's (5 decisive games each). Of course, I don't now if it would make a significant difference in the results.

The reason I've been considering all of this is that I'm trying to develop a predictor for game results in the Candidates Tournament for User: golden executive contest. I had been doing reasonable well (my goal was a 75% correct prediction) until the last round (70%), when I used my "hunch" instead of some of the model's predictions and I was wrong while the model was right. My enthusiasm is greatly tempered by the realization that if I had simply predicted that each game would end in a draw I would have been correct 71.1% of the time, even with the Nakamura ¨C Anand result included.

Premium Chessgames Member
  offramp: It's all moot, isn't it?
Premium Chessgames Member
  Tiggler: <AylerKupp> Sorry to be pedantic (though you would not be the one who would complain of this), but you cannot have tailored t-distributions for each player. The t-distribution is for the difference of two samples from the same normal distribution.

<offramp> Yes indeed, quite moot: worthy of debate.

Mar-27-16  luftforlife: In American usage, the adjective "moot" enjoys three denotations: first, "open to question; subject to discussion; debatable; unsettled"; also, "subjected to discussion; controversial, disputed"; second, "deprived of practical significance; made abstract or purely academic"; third, "concerned with a hypothetical situation." Webster's Third New International Dictionary (Springfield, Mass.: Merriam-Webster Inc. 1993), 1468.

The second denotation does not connote, and is neither equivalent with, nor tantamount to, irrelevance per se (for such a moot point retains its academic relevance, its fitness for abstract consideration, or both), but rather connotes a change in status that can, in the legal context at least, lead to a change in treatment -- to unfitness for further consideration, thwarting and thereby pretermitting practical, concrete, specific, and final resolution, disposition, or decision, of a case turning on, and fatally infected by, such a moot point -- due to limitations of power.

Premium Chessgames Member
  perfidious: <luftforlife> Used as an adjective, you are correct; however, that is not the full story.

While as a noun, the word is comparatively uncommonly used, as a verb that is not the case, though of course Over Here 'debate' is much more often employed.

Mar-27-16  luftforlife: <perfidious>: Thanks for your comment, and I take your point. I focussed on the American adjectival form and usage chiefly to point up (and to contrast with irrelevance per se) the American denotation "deprived of practical significance" -- a necessary and sensible accretion to meaning as it has arisen and as it has been applied as a term of art by our Supreme Court in its construction of our Constitution and its limitations on the federal judicial power, but one that has, on our shores, overspilled the narrow confines of that usage, and that has, in more general American usage, come to acquire connotations that dull, obscure, and even subvert not only the other American adjectival denotations, but also the essential and vital British origins, meanings, and past and present uses of the word in all its forms. I appreciate <offramp's> incisive comments and reminders in this regard. Your comment and the others above my own are illuminating and edifying. Kind regards.
Premium Chessgames Member
  TheFocus: Happy birthday, Arpad Elo.
Premium Chessgames Member
  alexmagnus: The average rating of women's top 100 is now the lowest since it is published as top 100 (and not as top 50). 11 points below all-time high from April 2015. (the even lower number from July 2013 on the FIDE site is wrong - in that month, FIDE accidentally published top 120 instead of top 100).

Open top 100 on the other hand is extremely stable in recent years.

Premium Chessgames Member
  AylerKupp: Calculating P(Win), P(Draw), and P(Loss) – Articles found (part 1 of 2)

The FIDE scoring tables (, Table 8.1b) indicate the probability of a win [ P(W/D) ] OR a loss [ P(L/D) ] for a player based on the rating difference between that player and his opponent. In certain situations it is desired to determine the probability of a win, a draw, or a loss [ P(W), P(D), P(L) ] for a player, again based on the rating difference between that player and his opponent.

This is surprisingly not easy to do, and additional information is needed. In his book, "The Rating of Chess players – Past & Present", he says "All data entering the rating system consist of total points scored in actually played game ... Discrimination as to how any point score is composed between wins, draws, and losses is beside the point." I think that the real reason that he ignored the effect of draws is that he developed his system when there weren't any cheap computers easily available to do calculations, and no on-line game databases containing the necessary information. And so he remarked that "Any consideration of draws in rating theory requires information on the probabilities of draws, as well as wins and losses, between individual players, information which is not readily available. Its accumulation would be inordinately laborious, and there has been little demand for it." Well, maybe not then.

Over a period of time I've been able to come up with only 3 articles describing methods for attempting to extract P(D) from P(W/D) and P(L/D), and in one of them the author threw up his hands when he realized that he needed additional information. These articles are:

1. "Individual Chess Game Probabilities based on Match Results". Written by Charles Roberson in 2012, the link no longer works. His method relates the P(W), P(D), and P(L) to the probabilities of a player winning or losing a match. I don't find it very convincing because he makes statements like "E(D) = Expected game draw percentage = Match play probability of losing" without substantiation, and when you plot P(W), P(D), and P(L) on the same chart you get a very sharp slope change in P(W) and P(L) that just doesn't look right.

2. "Bayesian Elo Rating" by Remi Coulom, written in 2004. ( His method calculates P(W), P(D), and P(L) using Bayes' Theorem by choosing a prior likelihood distribution over Elo ratings and computing a posterior distribution as a function of the observed results. Whatever that means.

Seriously, as Dr. Elo said, calculating P(W), P(D), P(L) from P(W/D), P(L/D) requires additional information, and the author estimates a Draw Likelihood by simulation. I think that this number represents the Expected Drawing Percentage [ E(D%) ] x standard deviation (SD) but I'm not sure. With E(D%) known then E(W%) and E(L%) can be calculated and from them P(W) and P(L). With P(W) and P(L) known then P(D) = 1 – P(W) – P(L) since all results are mutually exclusive.

When you plot P(W), P(D), and P(L) you get nice smooth curves which is what you hope for. I think. As a bonus, the article addresses and quantifies White's opening advantage, something that neither Dr. Elo nor FIDE address, although Dr. Elo mentioned it in his book, dismissing it with the comment "Any incorporation of colors into the rating system, however, would again inordinately expand the bookkeeping requirements with small prospect of any utility for it, in the final analysis." IMO, wrong again, Dr. Elo, even though it's understandable given the lack of accessible computers when he developed his system.

Premium Chessgames Member
  AylerKupp: Calculating P(Win), P(Draw), and P(Loss) – Articles found (part 2 of 2)

3. "How to calculate probabilities of Win, draw and loss based on the ELO system" written in 2014 ( with no user name given. The author attempts to calculate (PD) by looking at the expected score (EA, EB) in a game between 2 players (A and B) and, since FIDE considers a draw to be 1/2 White win and 1/2 Black win, the formulas:

EA = P(A wins) + 1/2*P(Draw) + 0*P(A loses) = P(A wins) + 1/2*P(Draw) EB = P(B wins) + 1/2*P(Draw) + 0*P(B loses) = P(B wins) + 1/2*P(Draw)

But then he realized that he needed additional information (which he would have known had he read Dr. Elo's book) and gave up, asking for suggestions. Which he didn't get.

Still, I used his method to calculate P(W) and P(L) using the P(D) calculated in articles 1 and 2 above. But no new information, the chart using P(D) from Article 1 looks just like the chart in Article 1 and the chart using the P(D) calculated using the P(D) from Article 2 looks just like the chart in Article 2.

I've created a spreadsheet describing the above in more detail as well as additional information such as:

1. The Percentage Expectancy Table (which is the same as FIDE's table 1b called the Scoring Probability) listed in Dr. Elo's book is wrong if a SD = 200 is used as Dr. Elo indicates he used. However, as user <Tiggler> pointed out, if a SD = 2000/7 is used instead, then the numbers match perfectly. I suppose another simplification made by Dr. Elo.

2. The FIDE Scoring Probability table (as well as Dr. Elo's Percentage Expectancy Table) only has 2 significant digits. As a result, each P(W/D) covers a range of rating differentials (RDiffs). It's easier to deal with probabilities if each rating differential has a unique probability associated with it, and this is listed in one of the spreadsheet tabs. Five significant digits are needed in order to uniquely associate each RDiff with a probability.

3. The probabilities calculated using the Match Results method are listed and plotted.

4. The probabilities calculated using the Bayes method are listed and plotted. This one is particularly interesting because you can see the effect of incorporating White's first move advantage into the probabilities. It also shows how to incorporate different White Advantage and Draw Likelihoods using data derived from the Opening Explorer database, the ChessTempo database, or any other database that provided a percentage of White wins, draws, and losses.

You can download this spreadsheet from The file is about 2.4 GB. You will need Excel 2003 or later to view it.

Premium Chessgames Member
  AylerKupp: Calculating P(Win), P(Draw), and P(Loss) – The Area method

I was not satisfied with the results obtained by attempting to calculate P(W), P(D), and P(L) based on the articles I found. The Bayesian method seemed the most promising since it yielded the expected, or at least hoped-for, smooth curves. But, since neither the games database used nor the simulation was made available, the probabilities could not be modified to reflect the different P(W)s, P(D)s, and P(L)s at different player levels (both players rated 2200+, both players rated 2300+, etc.), since the EloDraw parameter was not known. And it was also not clear to me how the factor to incorporate White's opening advantage (EloAdvantage) was calculated. Besides, the resulting P(D) simply seemed too low, particularly at the higher player rating levels.

Then I had an epiphany. P(D) is the area under the P(D) curve, and the Draw percentage is based on the ratio of this area to the total area, i.e.

A[ P(D) ]% = A[ P(D) ]% / ( A[ P(W) ]% + A[ P(D) ]% + A[ P(L) ]% )

So I could iterate and find the value of EloDraw that resulted in A[ P(D) ]% being equal to the observed in the games database filtered to include only the player rating levels desired. And A[ P(D) ] was easy to calculate since we are effectively dealing with the discrete probability distribution of a random variable (i.e. the results of games), it was just the sum of all the P(D)s x 1601 (the spread of the distribution, + 800 + 1 in this case), since the width of each value of the sample is = 1. And the spread is not actually needed since we are calculating ratios, so the spread cancels out. Then, once the value of EloDraw is known, P(W) and P(L) can be calculated.

I've updated the spreadsheet to add the description of the Area method and a tab to calculate and plot P(W), P(D), and P(L) for the set of ChessTempo win, draw, and loss percentages corresponding to both players rated 2200+ and 2600+. You can download this updated spreadsheet from here:

To make the distinction clearer, I changed the names of the parameters EloAdvantage and EloDraw to WhiteAdvantage and DrawLikelihood respectively, since they no longer have anything to do with Elo distributions, including FIDE's P(W/D) and P(L/D).

Using this method you can calculate the P(W)s, P(D)s, and P(L)s using the White win, draw, and loss (Black win) percentages from any games database and using any probability distribution that you think is the most accurate.

Premium Chessgames Member
  AylerKupp: OK, FWIW, I downloaded the ChessTempo data for games where both players were rated 2700+ (a very time consuming procedure, effectively 29 screen captures) and I got some interesting results:

Total number of games = 14,502 (an increase of about 180 games since 2-05-17)

White wins 4,167 games (28.7%)
Draws = 7,603 games (52.4%)
White loses 2,732 games (18.8%)
White's advantage = 9.9%

I filtered the data according to information in the Event column and I discarded games earlier than 1990 to be consistent with the KingBase data. This are the numbers for different types of games:

Classic 8,666 games (59.8%)
Blitz 3,205 games (22.1%)
Rapid 1,776 games (12.2%)
Blindfold 639 games (4.4%)
Exhibition 2 games (<0.1%)
Simultaneous 1 game (<0.1%)
Too Old 213 games (1.5%)

For Classic time control games only, here are the statistics:

White wins 2,141 games (24.7%)
Draws = 5,322 games (61.4%) (!)
White loses 1,203 games (13.9%)
White's advantage = 10.8%

So the incidence of draws for Classic time control games when both players are rated 2700+ is greater than when all games are considered. Which makes sense; I would think (I didn't calculate it) that the likelihood of errors is higher at faster time controls, never mind blindfold games.

As a check, here are the statistics for the recently completed Gashimov Memorial:

Total number of games = 45

White won 9 games (20.0%)
Draws = 29 games (64.4%)
White loses 7 games (15.6%)
White's advantage = 4.4%

Not too inconsistent, keeping in mind that this is a very small number of games so a substantial deviation from the means is expected.

I doubt that I'll repeat it with the data for players rated 2600+ since there are 67,506 of those games and that would require 135 screen captures! I think I'll wait until I set up the KingBase data.

Jump to page #    (enter # from 1 to 12)
search thread:   
< Earlier Kibitzing  · PAGE 12 OF 12 ·  Later Kibitzing>
NOTE: You need to pick a username and password to post a reply. Getting your account takes less than a minute, totally anonymous, and 100% free--plus, it entitles you to features otherwise unavailable. Pick your username now and join the chessgames community!
If you already have an account, you should login now.
Please observe our posting guidelines:
  1. No obscene, racist, sexist, or profane language.
  2. No spamming, advertising, or duplicating posts.
  3. No personal attacks against other members.
  4. Nothing in violation of United States law.
  5. No posting personal information of members.
Blow the Whistle See something that violates our rules? Blow the whistle and inform an administrator.

NOTE: Keep all discussion on the topic of this page. This forum is for this specific player and nothing else. If you want to discuss chess in general, or this site, you might try the Kibitzer's Café.
Messages posted by Chessgames members do not necessarily represent the views of, its employees, or sponsors.
Spot an error? Please suggest your correction and help us eliminate database mistakes!

home | about | login | logout | F.A.Q. | your profile | preferences | Premium Membership | Kibitzer's Café | Biographer's Bistro | new kibitzing | chessforums | Tournament Index | Player Directory | World Chess Championships | Opening Explorer | Guess the Move | Game Collections | ChessBookie Game | Chessgames Challenge | Store | privacy notice | advertising | contact us
Copyright 2001-2017, Chessgames Services LLC