< Earlier Kibitzing · PAGE 20 OF 22 ·
|Oct-28-10|| ||visayanbraindoctor: <Bridgeburner>
What do you think of making a second spread sheet using the counting errors method, this time removing the worst game of the WC matches?
In the case of Lasker-Schlechter, it's easy to choose an anomalous game. There is only one choice, game 10, which has 14 errors.
In case of Anand-Kramnik, the game with the most errors is game 9 with five errors. So we should choose this as the game to be excluded.
I have just sent you through email the modified spread sheet minus the worst games.
In this spread sheet minus the worst game, the total accumulated errors are:
Lasker 7 errors out of 9 games, or 0.78/game
Schlechter 8 errors out of 9 games, or 0.89/game
Kramnik 12 errors out of 10 games, or 1.2/game
Anand 8 errors out of 10 games, or 0.8/game
What do you think?
One very obvious fact is that with the worst game removed in each match, suddenly Kramnik is seen as playing a bit more inaccurately than the rest, while Lasker, Schlechter, and Anand are in the same ballpark. However, it does not tell the whole story for the simple reason that Lasker and Schlechter did produce an error laden game of a technically much worse kind than Kramnik ever did.
It's a conundrum. Perhaps it's best to simply state the conclusion as:
With the worst game of each match excluded, Lasker, Schlechter, and Anand played with errors per game ranging from 0.78/game to 0.89/game, very close to each other in accuracy, while Kramnik played more inaccurately with 1.2/game. Caveat: However, in the excluded worst game of his match, Kramnik played more accurately than both Lasker and Schlechter in the excluded worst game of their match.
I see foresee a possible minor problem in the option for the excluded worst game.
Anand-Kramnik has four games with four errors - games 2, 3, 4, 10. Suppose game 9 also happened to have 4 errors, which would mean five games with the same number of errors, which game should be excluded?
I have been thinking it should be the game with the errors most evenly spread between the two players.
A second option is the game with the most severe errors - in other words the game with the most blunders- but I am not too keen on this as this it could severely penalize the player who is not blundering, as excluding such a game might unfairly affect his stats for the worse.
|Oct-29-10|| ||Bridgeburner: <visayanbraindoctor>|
These are very similar figures to those that I posted on the Anand page, and which you copied back to this page on 24 October.
The main difference is that in compiling adjusted averages for the players, I've removed the most error-ridden game played by each player, rather than merely the most error-ridden game.
Game 10 of L-S has the most errors for both players, however, as you point out the errors are more even distributed through the A-K match.
Statistically, if you want to measure errors per game for the World Championship match, inclusive of both players, then of course A-K game 9 is the game that would be omitted to smooth the figures the way omitting L-S game 10 smooths out the main statistical "bump" in that match.
However, once we look at the stats for each individual player, and use the same statistical smoothing process, then we should <omit the game which features each player's worst figures>.
In terms of raw errors (all errors of 0.60 or greater lumped together), then Kramnik's 4 mistakes in game 9 would be omitted. In the case of Anand, game 10 would need to be omitted.
Because we cannot weight the errors for reasons already discussed, disaggregating error statistics has to be done carefully. The easy way to do it, that's enabled by using bandwidths, is to aggregate them from the top, eg
- blunders and 1.20-1.40 mistakes
- blunders and mistakes made in the two bands in the 1.00-1.40 range
all the way down to 0.60.
Aggregating adjacent bandwidths, and smoothing statistical bumps according to whether we're looking at the overall statistics or at individual players solves the problems you've described IMO.
|Oct-29-10|| ||Bridgeburner: <visayanbraindoctor>|
I've modified and updated both bios.
|Oct-30-10|| ||lostemperor: I have read some comments now. Interesting project. Somehow astounding to see old champions doing well in comparison with today's champs. I was wondering about the complexity of the games though trough ages. It seems research of Guid and Bratko and Sullivan http://www.truechess.com/index.html , have taken complexity into account. But whether ply search (Guid/Bratko) or a fixed complexity data/line of Sullivan (am I correct here?) is sufficient to determine the complexity of the games I wonder. Chess did progress in 100 years I suppose.|
I think it is interesting also to compare how people who are trained by computers will do! Like Carlsen who grew up in this computer age.
Anyway keep up the good work and looking forward to see next results.
|Oct-30-10|| ||nimh: <Somehow astounding to see old champions doing well in comparison with today's champs.>|
Generally they don't, as you can see if you observe carefully all available researches. A very important fact one must know is that the results show biases towards positional players, the simpler positions a player prefers, the more overrated his results in such analyses tend to be as long as practical play is not taken into consideration in the analysis methodology.
Another way how to make older players to look overrated is not to take difficulty of positions and time controls into account and look at positional players only. That's precisely what <Bridgeburner> has been doing. No wonder it easily leads to faulty conclusions.
<But whether ply search (Guid/Bratko) or a fixed complexity data/line of Sullivan (am I correct here?) is sufficient to determine the complexity of the games I wonder.>
It is sufficient. Take a look at the complexity vs average error graphs in their studies. You can see yourself that more difficult positions induce less accurate play.
<Chess did progress in 100 years I suppose.>
<Like Carlsen who grew up in this computer age. Anyway keep up the good work and looking forward to see next results.>
I analyzed Carlsens games in Nanjing 2009. Without taking into accont how much practical play was involved in his games, the conculsion I reached to was that his performance according to the accuracy of play was 2970 - quite close to ELO rating performance.
see page 10
|Oct-30-10|| ||lostemperor: <nimh> Thanks for your reply. Positional play can lead to different results but we can hardly call Lasker a solid positional player on the contrary although he is capable of that too of course. Still in the Sullivan's ranking he is tied 4 in the three year peak (no draws included)! So how is this for a man, Lasker, who play the most annoying move for his opponent rather than the soundest possible continuation? I would say very good!?|
I am gone in an hour so I haven't got time to check the link you gave but I will. So how does that Carlsen 2970 performance compare. What would Kasparov or Fischer have about scored in their average top tournament they have won you suppose?
|Oct-30-10|| ||nimh: <we can hardly call Lasker a solid positional player on the contrary although he is capable of that too of course.>|
I know very well what the popular image of him is, but often what people perceive and what the reality is do not match. Take morphy for instance, he's widely regarded as a wild tactical attacking palyer, but in reality his play was rather conservative and solid in comparison with leading romantic players of his era.
Although Bratko & Guid rate his play slightly more complicated than on average, according to Sullivan and my onw research his positions were rather easier than on avergae.
<Still in the Sullivan's ranking he is tied 4 in the three year peak (no draws included)! >
I suggest looking at results with draws included. There's no objective reason to eliminate them in the conclusions.
Lasker's average rank across all tables is 10.5 out of 16.
<So how does that Carlsen 2970 performance compare.>
Elo ratings as of 2008 were taken as a basis of comparison.
You can see my other studies on my profile and forum. I suggest reading all of them carefully whenever wou have time and in case something is incomprehensible, please ask in my forum.
|Oct-30-10|| ||visayanbraindoctor: <lostemperor> I would suggest that you go directly to the analysis of all the games the Lasker-Schlechter World Championship Match (1910) for you to glean an idea of how difficult the games were. I posted <Bridgeburner's> analysis on every game page of this match.|
There is a reason why <Bridgeburner> and I decided a long time ago to choose World Championship Match games for this project.
One thing that is uniquely great in the institute of the World Championship Match is its consistency. At this era, in 1910, super tournaments, wherein only the very top masters participated, were very rare or non-existent. Practically all international tournaments were 'diluted' with a lot of relatively weaker players, who often became punching bags; the result of which is that the top master who won the tournament was often the one who could punch the weakies the hardest. This makes it difficult to compare a top master's performance from this era with that of today's, wherein the top masters keep on battling out each other in non-weakie diluted super tournaments. The most notable exception is the World Championship Match. Here, in the early 1900s as in today, the two top masters of the world play each other a successions of games over and over again. There are no weakies to beat up in between these games. You meet the same very strong resistance all the time. The World Championship matches, played under very similar conditions of high tension, classical time controls, and offering the largest stake in the Chess World (namely the Title of World Champion) can function as a controlled phenomenon that can be used to accurately gauge just how good the top masters of an era have become.
|Oct-30-10|| ||visayanbraindoctor: <lostemperor: I was wondering about the complexity of the games though trough ages.>|
I tend to use the terms <difficulty> or <complicated> instead of <complexity> because the latter is technically a term to denote for a structure an increasing number of component parts and increasing amount of interactions for these components. In which case, the most complex positions would be early opening positions, and the least complex would be endgame positions.
What you are talking about is really <difficulty>, a subjective human evaluation of a position. There are positions which to most human chess eyes look <simple>, meaning they fit into the usual positional patterns that we can subjectively easily evaluate and analyze. There are also bizarre or messy positions which may be just as complex as simple ones with the same number of pieces, but are difficult for humans to evaluate and analyze because they are not as amenable to our usual ability for pattern recognition.
I have posted above my objections for the use of <difficulty> (what you term as <complex>) as a parameter of chess strength.
<Chess did progress in 100 years I suppose.> In opening theory, chess did, and it does all the time. However, outside openings theory, a player has to play over the board with his innate talent and acquired experience and training; and these have human limits. (Please see one of my posts above on this.)
In each era, there is an opening theory that is generally accessible to all the top masters. Thus the advantages of any new development tend to cancel out in competitive play in each era, because each top master of each era has access to the same opening theory and research methods. This would mean that openings in the Lasker-Schlechter World Championship Match (1910) would tend to be inferior to 21st century openings. However, once they get into middlegames and are relying on their native talent, experience, and motivation, the Project indicates that Lasker and Schlechter played the games of their match mostly as well as Kramnik and Anand did.
<I think it is interesting also to compare how people who are trained by computers will do! Like Carlsen who grew up in this computer age.>
The fact that on top of the Chess World today sit the likes of Anand, Kramnik, Topalov, Ivanchuk, Gelfand implies that the computer age has not brought any overt advantage to the younger generations. Why? Because the older 1990s top GMs, who did their opening preps through the book-pen-paper method, simply accessed the computers like the youngsters did, canceling out any advantage that computers give to opening prep. Contrary to popular belief, prepping openings via the book-pen-paper method is much more tedious and difficult than doing it by computers, and so is a job better suited to energetic and healthy youngsters, and not to middle-aged GMs. It is probably the opposite: The presence of computers allows the likes of the aging Anand and Kramnik (who even has to look around now and then if his chronic auto-immune incurable disease is bothering him) to maintain equality in their opening preps with the younger GMs.
|Oct-31-10|| ||Bridgeburner: <lost emperor>
<difficulty> and <complexity> are nebulous concepts, mainly subjective as <visayanbraindoctor> has taken pains to point out, and in any case are extremely difficult to quantify and even more difficult to use to operate upon error figures.
Also <nimh> emphasizes the importance of time controls in calculating accuracy. His thesis is that the more relaxed time controls of the old days allowed for more considered moves, and fewer apparent errors.
Nevertheless, compare the old masters with the modern players (check the CG.com database) and you'll find modern masters play a much higher proportion of draws than masters a hundred years ago. Moreover, the game with easily the most errors (nearly half the errors in the match) was the famous and decisive game 10 of the Lasker-Schlecter match which was played over three days, and therefore at least two adjournments, which theoretically should have provided the players considerable extra time for analysis. All this leads me to believe that whatever impact the slightly different time controls had cannot be reliably factored in, especially when hedge factors such as arbitrarily determining that all adjournments simply add 1 hour to a game are thrown into the mix.
The bottom line with my research is that the data is as accurate as it is feasible to acquire, and will be even more accurate with the upgrade in hardware and software which I will bring to task in the next phase of my project. <I cannot over-emphasize the important of <<data integrity>>. All the wizardry of data management is worthless without the data, or without data in which one can have confidence.>
Ensuring that the engine evaluations were as accurate as possible took about 95% of the period of over a year in which it took to analyse the 21 games in the 1910 and 2008 World Championship matches, whereas other studies have hundreds, if not thousands of games under their scrutiny. I don't believe that the fidelity of the error data of some of these projects is anywhere near as reliable as they should be, or that the moves have been sufficiently checked for accuracy.
<The primary unit of measure with these studies is the difference between an engine's evaluations of its preferred moves and its evaluations of the moves actually played. This basic brick of data must be unquestionably accurate, as the difference - what <nimh> refers to as the <average error> - is of <little or no value> if the engine's preferred move is wrong.>
Typically, a move in one of these studies, eg: <nimh>'s, might be subject to 5 minutes of engine time, or 15 ply...this may be sufficient for a quick evaluation of games of FM standard or below, but super-GMs and world champions are another kettle of fish altogether.
Complexity is an issue in my work insofar as it determines how long it takes my engine to calculate evaluations that are entirely consistent with every other evaluation in the game beginning from, or near, the end of a theoretical line.
Also, I examine each and every move in depth, including openings and endings through to the endgame. I don't believe in skipping the opening because it's established theory, and nor - more importantly - do I believe in omitting endgames because some engines may not be good enough to properly evaluate them. In my opinion, this is a complete cop out as many crucial errors occur in the endgame.
Endgames mastery is one aspect of chess that distinguishes master strength (in other words it sorts the sheep out from the goats), especially if you look at the great endgame masters like Lasker, Rubinstein and Capablanca.
|Nov-01-10|| ||visayanbraindoctor: <Nevertheless, compare the old masters with the modern players (check the CG.com database) and you'll find modern masters play a much higher proportion of draws than masters a hundred years ago.> |
This may not even be true in World Championship matches between Lasker and Schelecter, Lasker and Capablanca, and Alekhine and Capablanca, the three best players of the pre-WW2 era. There is no weakie to beat up between these players. (",) As seen in the Lucena page, it would be reasonable to think that these players, who played with very little error, would allow only for very few game losses in an event that required them to give it their best shot for them to win, and so matches between them would tend to have a lot of draws.
Regarding modern super GM tournaments, not one of them is typically as strong as the typical World Championship match. Even super GM tournaments will feature relatively weaker players even if they are all very strong. A World Championship match would feature no relatively weaker players at all.
Let us take the recently concluded Bilbao chess tournament, allegedly the strongest chess tournament in history. The winner, Kramnik has to play two games each against Anand, Carlsen, and Shirov, for a total of 6 games. That obviously is weaker opposition than playing Anand 11 straight games as he did in the 2008 WC match.
All tournaments in the pre-WW2 era may have been weakie-diluted, but certainly not a World Championship match. What event can be stronger than facing Lasker in 10 straight games in 1910, or Capablanca in 14 straight games in 1921?
|Nov-01-10|| ||visayanbraindoctor: <Moreover, the game with easily the most errors (nearly half the errors in the match) was the famous and decisive game 10 of the Lasker-Schlecter match which was played over three days, and therefore at least two adjournments, which theoretically should have provided the players considerable extra time for analysis. All this leads me to believe that whatever impact the slightly different time controls had cannot be reliably factored in, especially when hedge factors such as arbitrarily determining that all adjournments simply add 1 hour to a game are thrown into the mix.>|
In addition, it's noticeable that most novelties in the pre WW2 era occurred at around moves 6 to 12, while in today's era, novelties in GM tournaments usually occur in moves 16 to 22. That's why it is also noticeable that when we watch them live in the internet, present-day GMs seem to blitz out their first 20 moves, before they slow down and begin playing with their inherent chess abilities. In the pre-WW2 era, the masters probably blitzed out only their first 10 moves before they slowed down and began playing with their inherent chess abilities. To put it in another way, pre-WW2 master games begun the over the board play around move 10, while today's master games often begin over the board play on move 20, a full 10 moves later. This phenomenon may give today's masters relatively more time to think than the pre-WW2 masters before they reach time control.
|Nov-02-10|| ||whatthefat: <nimh: I suggest looking at results with draws included. There's no objective reason to eliminate them in the conclusions.>|
I think this is actually quite an important point. On average, I would expect draws to be of higher quality than decisive games (simply because it takes at least one serious error to make a game decisive). Around 60% of modern grandmaster games end in draws, whereas only around 20% of 1900 era games ended in draws. Thus, excluding them will differentially bias the results in each case.
Anyway, just enjoying following the discussion!
|Nov-03-10|| ||Bridgeburner: <whatthefat>
Welcome back. I can't see why anyone would exclude draws!
You'll notice that since our discussion, I've abandoned weighting errors as a mathematical absurdity.
|Nov-03-10|| ||visayanbraindoctor: From the Nakamura page.
<<data management is <<even more important>> than the data>>
In the Scientific Method, this is totally unacceptable.
The hypothesis is supported (or proven false) by empirical data that is gathered or generated by an experiment with a replicable methodology. The management of data is totally wasted if the method of gathering or generating the data is vague or inaccurate. First, the data has to be valid.
<<If the data is invalid, management of that <data> is <<absolutely irrelevant>>.> You may as well make it all up.>
As for <difficulty>, it is a subjective human perception or feeling that differs from individual to individual, and cannot be accurately quantified.
|Nov-03-10|| ||visayanbraindoctor: <whatthefat: Around 60% of modern grandmaster games end in draws, whereas only around 20% of 1900 era games ended in draws.>|
My hypothesis is it's because tournaments in the 1900s had a lot of weakies in them, who made a lot of errors, and who may have even affected the play of the really strong players. This is why the subject of the Bridgeburner Project is World Championship matches, wherein there are no weakies to beat up, or for a strong master to get complacent about. I have a post on this above.
There is more on this in the Lucena page.
|Nov-03-10|| ||SugarDom: Guys, i'm not here to diss your respective systems or even compare with.|
But IMO, removing games selectively makes the process less "scientific".
|Nov-03-10|| ||SugarDom: The system i'm using does not deal with games but with moves and so i'm able to include anomalous games. |
I use <weighted average error>. This is the sum of errors divided by total number of moves.
And since i'm using a cap of 1.1 (i adjusted from 1.00), 1 error does not drastically reduce the accuracy value of a player.
Player makes 9 perfect moves and 1 blunder capped at 1.1.
The average error will be 1.1/10 or 0.11
I introduce a <accuracy percentage> value.
The formula is:
Accuracy Percentage = No.of moves/(No. of Moves+sum of errors)
In the above example:
AP = 10/(10+1.1)
AP = 90.1%
I also count all the errors values (capped at 1.1).
Here is an example:
|Nov-03-10|| ||SugarDom: MoveNo/ Eval/ Best/ Diff/ Mark
29/ m10/ -8.49/ 1.1/ R
28/ -8.46/ -8.09/ 0.12/ R
27/ 0/ b
26/ 0/ b
25/ -7.19/ -2.54/ 1.1/ R
24/ 0/ b
23/ -2.92/ 0.35/ 1.1/ R
22/ 0/ b
21/ 0/ b
20/ 0/ b
total moves: 10
sum of errors: 3.42
average error per move: 0.342
accuracy percentage: 74.5%
*b indicates best move
*R indicates "renormalized"
|Nov-03-10|| ||SugarDom: The above example game is Topalov vs Anand WC game 1.|
In moves 23, 25 and 29 the errors were capped to 1.1. Move 29 is mate in 10.
In move 28, i adjusted for inflation in large evals and divided the difference by a factor of 3.
|Nov-03-10|| ||SugarDom: IMO, the above system is more on statistics gathering, simplistic and fast.|
I am able to processed WC series and matches in a couple of days.
|Nov-03-10|| ||SugarDom: In the above example, there was no small errors, but i do count the smallest of errors, such as 0.01.|
This is because as (someone pointed out earlier) this has an accumulative effect. I believe this system (of counting the smallest errors) is more mathematical and scientific.
Somebody might ask, "why then you cap at 1.1?".
Mathematics has to deal with inflation at the end of the spectrum or even "infinities" and must apply "renormalization", hence a "cap". This is also being applied in the elo system.
Secondly, in principle we want to count 1 error as close as possible to a value of 1.
|Nov-03-10|| ||nimh: <whatthefat>
You're right, of course, but what were the exact criteria for 1900 period and the strength of players taken into consideration in the statistics?
|Nov-03-10|| ||alexmagnus: <visayan> You still didn't answer my question though, namely what makes you so sure that humans already reached their limits in chess.|
By the way, not only the opening progresses. We get more strategic knowledge, new patterns etc. It's not like top players' strategy didn't change since "My system". It did (didn't Kasparov once say that in their first match with Karpov there were nuances which back then were <understood> by nobody but the two Ks?). And the better understood strategy is, the better the games are...
Same with patterns, be it of strategical or tactical nature.
|Nov-03-10|| ||crawfb5: <whatthefat: Around 60% of modern grandmaster games end in draws, whereas only around 20% of 1900 era games ended in draws. Thus, excluding them will differentially bias the results in each case.>|
There are potential biases with regard to draw percentages in the older games.
Game databases might be biased toward decisive games as we go back in time. I know I'm having trouble finding more than half the games from the 8th American Chess Congress in Atlantic City 1921.
Draws in early tournaments often did not count, and had to be replayed. This might or might not be shown in the crosstable, which could underestimate draws if that is a data collection source.
As a quick example, I've recently done a game collection on the 7th American Chess Congress in St. Louis 1904 (Game Collection: St. Louis 1904). A quick look at the crosstable usually presented suggests only <two> draws in the nine rounds played. However, a drawn game required a replayed game with colors reversed. If the second game was also drawn, only then would the result be scored as a <single> draw. If you go through all the individual games, there were actually <twelve> draws.
Other tournaments had different rules for what to do with draws. This could influence players to press harder for wins.
I think your general point holds. I only wanted to suggest the difference might not be quite as big as it appears at first glance.
< Earlier Kibitzing · PAGE 20 OF 22 ·
A free online guide presented by Chessgames.com