< Earlier Kibitzing · PAGE 10 OF 10 ·
|Dec-27-09|| ||alexmagnus: Sorry, the link for the second player should be of course http://db.chessmetrics.com/CM2/Play...|
|Dec-28-09|| ||notafrog: Since it is difficult to collect the source data, I obviously cannot check the examples to mentioned. It is possible that the calculations are wrong.|
Other possible reasons for a delay in ratings entering the system is the criterion of how "connected" a player is to the top players.
There is obviously no way to check these hypotheses without reconstructing the input data.
I once started collecting data from the site, though I don't remember where they are, or in what state they are.
|Dec-28-09|| ||alexmagnus: I've got an answer from Sonas today. He explains the 30-point change with the effects of simultaneous calculation. But that doesn't explain why there are no large changes in the following months.|
|Dec-29-09|| ||notafrog: You can try asking Sonas for the game result database. Then we can check if the 30 point difference is correct, and why it appears.|
|Feb-08-10|| ||alexmagnus: Those who explain rising elo with inflation of this - how do you explain rising results in memory sports? Those rise even rapidlier and are <absolute>. Look at these records, all of them are quite new:|
The memory championships exist since early 90s.
|Feb-08-10|| ||Tomlinsky: I started to explain... but can't be arsed. Sorry. :)|
|Apr-13-10|| ||RainPiper: How large is the white-against-black advantage expressed in ELO points? Put otherwise: If I play white against a higher rated player, what is the rating difference that gives me a winning probability of precisely 50%?|
This has certainly been calculated somewhere, but I wasn't able to find it yet.
|Apr-13-10|| ||whatthefat: <RainPiper>
In my experience looking at the data for super-GMs, it's circa 50 points, although with big differences between individual players. I would also expect it to depend strongly on playing strength. It's also been shown to depend on the time control.
Jeff Sonas seems to imply that it's about 35 points here http://www.chessbase.com/newsdetail...
Meanwhile, NIC and cg.com both have White scoring about 55% on average - see http://www.newinchess.com/Yearbook/... and ChessGames.com Statistics Page - which also works out to a 35 point rating difference by the Elo formula.
|Apr-14-10|| ||RainPiper: Thanks for the links, <whatthefat>, this was exactly the sort of information I was looking for.|
It also answers a follow-up question that I had in the back of my mind. Is there discussion about correcting performance ratings for the white/black bias? (Jeff Sonas actually advocates this in the text you linked.)
In tournaments for individuals, this is not that much of an issue. (The number of games with white and black will hardly ever deviate by more than one). However, in team events, the white/black ratio can be strongly biased. E.g. at the 2008 Olympiad, Grischuk had 6 times white and black only twice:
|Nov-07-10|| ||whiteshark: Quote of the Day
" The process of rating players can be compared to the measurement of the position of a cork bobbing up and down on the surface of agitated water with a yard stick tied to a rope and which is swaying in the wind. "
-- Arpad Elo
|Nov-07-10|| ||prensdepens: What is this? No picture for the gentleman who came up with the system by which the strength chess players could be calculated, and which is now the norm.|
Can we have his picture please on his page to honor The Man for his significant contribution to chess?
|Feb-22-11|| ||cu8sfan: Elo is being challanged by data miners: http://www.kaggle.com/ChessRatings2....|
|Feb-22-11|| ||alexmagnus: Well, as already said multiple times, Elo isn't designed to predict results, it is designed to describe results. Unlike f.x. Chessmetrics (which is a better predictor with some funny consequences for description)..|
|Feb-22-11|| ||Akavall: <Well, as already said multiple times, Elo isn't designed to predict results, it is designed to describe results. >|
Yes. And the two seem to be quite different. For example, I think a good prediction method should be very sensitive to players' current form, which is generally pretty volatile.
For example, player A is 2650 rated; he starts a tournament poorly, so prediction algorithm should predict his results as if "A" was considerably weaker player (let's say 2500). Therefore, the algorithm would have that player's rating at 2500 for that tournament (unless the player turns his performance around). However, for the next tournament the the algorithm should treat the player as somewhere closer to 2600, and then adjust to his performance during the tournament.
The algorithm would probably do pretty well if treated Grischuk as 2550-2600 player during Tata-Steel, but it should weight him higher than that for the next event.
This would lead to wild fluctuations in the rating that the algorithm assigns to the player.
|Feb-22-11|| ||mojonera: harkness system was better .|
|Feb-22-11|| ||TheFocus: <mojonera> <harkness system was better.>|
And wasn't it around before Elo's system?
|Feb-23-11|| ||alexmagnus: <Akavall> Yes, a perfect predictor would be about as volatile as the TPRs are, while a perfect descriptor would remain about constant most of the time and heavily change only if the improvement/decline is clear. I don't know if Elo is perfect in these terms (how is it even possible to test the "descriptive power"?), but it clearly does a better job than any system made on the base of "best predictive power".|
|Apr-04-11|| ||alexmagnus: In the chessmetrics system it's a known phenomenon that player's highest rating is sometimes higher than his highest performance - which is "philosophically" a result of predictive (and not descriptive) nature of chessmetrics, and mathematically a consequence of weighting+padding. The differences are usually small in such cases, but what if we take performances only <prior to the achievement of the rating peak>. The record among the 3-year-average top 100 holds Neumann whose highest CM-rating is 149 (!) points higher than his highest CM performance prior to reaching that rating (and 53 points higher than his highest ever CM performance). Second is Lasker with a 76 pt difference. Talking about sense and nonsense of "padding" and "predictive power"...|
|Apr-08-11|| ||drik: drik: <metatron2: also each trial in the binomial dist has only two possible outcomes, while each chess game has 3 possible outcomes, meaning that I also ignored all the draws>
I was looking at it as p=win & q=not win.
<and that Fide's <Elo> is also based on the normal distribution (they don't use logistic curves)>
True, but I think that USCF uses logistical distributions because gaussians underestimate the rate of upsets at large rating differences. FIDE ratings attempt to make the gaussians heavy-tailed, by having a cutoff threshold which never asymptotically approaches zero. But these cause distortions that are worse than the supposed problem.
|Apr-08-11|| ||drik: <alexmagnus:> I'm comfortable with the idea of a 'perfect descriptor' - but the idea of a 'perfect predictor' worries me. The old PCA ratings had a measure of standard deviations - consistent players (like Leko & Kramnik) had SDs ~ 120 & inconsistent players (like Shirov & Morozevich) had SDs ~ 200. Given this unavoidable 'statistical noise' on the ratings; how 'perfect' can any predictions be?|
|Apr-08-11|| ||alexmagnus: I don't like the idea of a perfect rating being a perfect predictor either. But a hypothetical perfect predictor would do exactly this - be able to predict the next couple of performances, no matter how volatile the player is. Those who try to make their ratings perfect predictors just test them on the past data and see how much their results deviate from the predicted ones.|
Of course, a perfect discriptor would give totally different ratiungs than a perfect predictor. The question is different - does <persuing a goal> of perfect prediction imrpove descriptive abilities? My answer is no, as seen on those drawbacks of Chessmetrics (which is a better predictor but worse descriptor than Elo), but I wonder if there is any way to prove/disprove it.
|Apr-09-11|| ||drik: Frankly I'm not sure that the goal of 'perfect' prediction is even hypothetically possible. Because the players can examine their past games & spot flaws or even TNs - the learning process is not taken into account. The closest analogy is in traffic simulation or flocking behavior; where even the long term behaviour can have complex structure. Although there are only 2 beings in this case, perhaps the games can be regarded as flocking along common positional lines?|
|Apr-09-11|| ||alexmagnus: A typo in my last post - pursuing.
A perfect prediction is of course not possible, but as I say the queation is only whether it it useful trying to pursue it as f.x. Sonas does.
|Apr-12-11|| ||drik: Schach Matov: <you seem to pride yourself on "knowing" statistics but you made an amateur mistake in combining two completely different animals like loses and draws.>|
To apply binomials you need to have TWO possible results. Having THREE can be handled by only solving for the wins of one player & lumping the wins+draws of the other together. Then you repeat by solving for the wins of the second player & lumping the wins+draws of the first player together. Since you know the length of the match, you can subtract the decisive games to obtain the number of draws.
|Jun-08-11|| ||drik: <metatron2> you might find this of interest - http://www.chessbase.com/newsdetail...
particularly that the second best approach defined a UNIQUE RATING FOR EVERY PAIR OF PLAYERS. This underlines how strongly nontransitive probability reduces the effectiveness of ratings systems.|
And <Schach Matov> it shows why match results are better predicted by head to head match ups, than by comparing ratings.
< Earlier Kibitzing · PAGE 10 OF 10 ·