< Earlier Kibitzing · PAGE 32 OF 32 ·
|Feb-10-15|| ||Olavi: A preliminary list for Karpov. 46 supertournament wins. I expect corrections to be necessary. Some points:
His three Soviet chs are included. Hastings 1971-72 is not. Korchnoi was the only top ten player, but Karpov himself might have been, had the list been updated more than once a year.|
Several of his greatest wins are not on the list. E.g. Waddinxveen 1979 is not included, but if memory serves me right, he scored the best performance rating of the 70s (of all players) there; a small four player DRR, Hort was nr. 11.
As a sign of the times, in e.g. 1990 he did not play a single ST (he won Biel, which included nr. 11 Andersson)
Alekhine mem 1971
San Antonio 1972
Soviet ch 1976
Las Palmas 1977
Bad Kissingen 1980
Soviet ch 1983
Wijk aan Zee 1988
Soviet ch 1988
Reggio Emilia 1991
Wijk aan Zee 1993
Dos Hermanas 1995
|Feb-11-15|| ||Kinghunt: <AylerKupp> Thank you for those interesting points on the use of engines to evaluate playing strength. My response comes along two different lines:|
First, while I completely agree that their search depths are not nearly deep enough to have great confidence for any single move, the point is that other hundreds of moves, they will be fairly accurate on average. You don't trust them on any single move, but you trust the aggregate judgment over the entire dataset.
I find it somewhat unclear what issue you think is left unresolved by http://zeus.fri.uni-lj.si/matej/doc.... It doesn't <prove> that rankings remain roughly constant as you go beyond d=13, but the lack of any trend lines at lower depths suggests increasing depth further is unlikely to result in any significant changes. You'll have higher resolution of strength estimates with greater search depth, but that's a "known" quantitative effect. So unless there are any specific reasons to think that the analysis will behave qualitatively differently at higher depths, I am not particularly concerned with this issue.
Second, and the one I find by far most convincing, the methodology has been "validated" on modern players. In the paper I find most convincing (http://www.cse.buffalo.edu/~regan/p...), the data is divided into two halves: a training set used to build the model and calibrate parameters with known ratings, and a test set. While the test set includes many historical games, it <also> includes recent games of rated players that were not included in the training set.
The results of this cross-validation are quite convincing. Check out the estimations of playing strength of recent tournaments in tables 5, 6, and 7. The confidence intervals for any given tournament are rather wide, but when you pool tournaments, the results are clear. Collectively, the category 21 tournaments are estimated to be 30 points stronger than the category 20 tournaments (actual rating difference between tournament sets: 24 points). This cross-validation is the most convincing measure of a method's accuracy that you could ask for (and I have indeed asked for for almost all other similar studies). If something passes cross-validation, it has to be working really well.
|Feb-11-15|| ||Kinghunt: (Really busy with work lately, that's all I have time for tonight but I hope to catch up on all the other great posts here sometime in the next few days.)|
|Feb-11-15|| ||Olavi: First correction. Salov won Tilburg 1994, not Karpov. But 1992-1994 it was a knockout event consisting of two game matches. When Karpov won it in 1993, he beat Romanishin, Vyzhmanavin, Kaidanov, Yusupov, Beliavsky and Ivanchuk, V., K. and I. of them in rapid play offs.|
|Feb-11-15|| ||EeEk: <Olavi> Great list. I did a quick check, and it looks accurate. But:|
- Soviet Championship 1976 and 83 should probably be excluded, as it was a national event?
- Although it can be argued that Milano 1975 should be included, Karpov finished 2nd in the round robin. He later won the semi final and final.
- Wijk aan Zee 1993 was a knockout.
|Feb-11-15|| ||Nf8: <Olavi> Thanks for compiling this list. I also did a check, so here are a few comments:|
- You seem to have left out Reykjavik 91 (http://www.365chess.com/tournaments...)
- Biel 90 can actually be counted, since the definition says "at least two players who have been in the top 10 <in the last year>" and so Ulf Andersson qualifies.
- On the other hand, as was already noted, Karpov didn't win Tilburg 94
and two of the other tournaments listed, Wijk aan Zee 93 & Tilburg 93, were KOs.
- A more general comment: looking at the full lists of players, I'd say that about 9-10 of these tournaments (mostly from the 70s), although they qualify according to the minimum definition of "super" we are using here, were on average weaker than what we're used to today - in the sense that a larger percentage of the participants were considerably weaker than the top players. The clearest example is Las Palmas 77 (http://www.365chess.com/tournaments...), where 9 out of the 16 players were not top-100 at the time (http://www.olimpbase.org/Elo/Elo197...).
|Feb-11-15|| ||AylerKupp: <Kinghunt> (part 1 of 2) |
Thank you for taking the time to read my (as usual) lengthy post (and this one is no exception). To try to clear up my comments about http://zeus.fri.uni-lj.si/matej/doc..., Figure 3 shows a generally declining score for each of the players as the search depth is increased, indicating a decreasing difference between the engine's (Crafty in this case) evaluation of the best move vs. the engine's evaluation of the move actually played. Other than the trends for Steinitz, Euwe, Capablanca, and Kramnik, it is hard to tell which score trend belongs to each player. But I do see that many of the trend lines are crossing, indicating that the ranking are <not> constant (granted, the differences are small) as the search depth increases, so I think that the authors' statement that "The results clearly demonstrate that although the scores of the players tend to decrease with increasing search depth, the rankings of the players are nevertheless preserved at least for the players whose scores differ considerably from the others." is misleading. That's like saying that the rankings of some players are preserved as search depth increases, but others (the majority, by the way), are not."
And, of course, extrapolation can be risky. The ranking trends may continue or they may not. There is, IMO, a considerable difference in the rankings between most players based on their scores through search depth = 12 and there is no way to really tell how those rankings would change as the search depth increases to, say, 30 ply or more.
And those scores and rankings reflect the differences between one engines' evaluation. It leaves unanswered the question as to how those scores and rankings would change if more than one engine is used in the evaluation. It is a fact according to my experience that both the rankings and evaluations of different engines differ, sometimes considerably, and the confidence that one has in evaluations from different engines differ according to their search depth. So who's to say whether one engine's evaluations are more correct than another engine's evaluations? This issue is not addressed at all by the authors.
Finally, all evaluations were done to a fixed search depth. But the necessary search depth to reach a sufficient degree of confidence in an engine's evaluation of a position is highly dependent on the position itself. In complex positions without a sequence of forced moves a deeper search depth is needed than for a simpler position with a sequence of forced moves. And some engines may give relatively more correct evaluations in open positions than in closed positions while other engines might be the reverse, or at least one engine may be relatively better in one type of position than another when compared with another engine. So I think that it is our <confidence> in the correctness of an engine's evaluations and move ranking that is the determining factor, not a particular search depth, specially if the search depth is inadequately shallow.
|Feb-11-15|| ||AylerKupp: <Kinghunt> (part 2 of 2)|
With respect to the methodology described by Dr. Reagan in http://www.cse.buffalo.edu/~regan/p..., I have similar concerns. It is demonstrably valid for the limited search range (Rybka 3 at d<14) but that doesn't mean that it would be equally valid, if at all, for search ranges d>>14, particularly if other engines are used. We just don't know. And, given that the training data was from the period for 2006-2009 and the "Solitaire Set" was from a similar period, the 2005 and 2007 world championship tournaments, the method does not address the methodology's accuracy when using data from different periods. So I'm not convinced. It would go a long way in my mind to validate the methodology if, say, the "Solitaire Set" was calculated for a completely non-overlapping period such as the 2013 and 2014 world championship tournaments and compared with the training data from the period for 2006-2009.
These analyses remind me of a joke (which I've posted before, so forgive me if you've already seen it) told to the class by my freshman calculus instructor. He indicated that a professor had a seminar with a divinity student, a math student, a chemistry student, a physics student, and an engineering student. Given the students' different backgrounds, he was curious as to how they would approach the following homework problem: "Prove or disprove the theorem that all odd numbers are prime."
The next day, to his great surprise, all 5 students claimed that they had proven the theorem. Only 2 of the responses are relevant here:
Chemistry student: "I said to myself 1 is prime, 3 is prime, 5 is prime, 7 is prime; so I decided that I had enough data to conclude that all odd numbers were primes."
Physics student: "I said to myself 1 is prime, 3 is prime, 5 is prime, and 7 is prime. Nine is not prime. But 11 is prime and 13 is prime, so 9 must have been the result of experimental error. Therefore I concluded that all odd numbers were prime.
But don't get me wrong. The authors have done a very thorough job with the tools they had at their disposal, namely reasonably available hardware and software circa 2006. I just don't think that their analysis is anywhere thorough or deep enough to reach the conclusions that they so confidently state. I would feel much more confident in their conclusions if they regenerated their database using multiple engines and carried out the engines' searches to a deeper depth using MultiPV analyses, possibly (in order to take into account the different engines' search tree pruning capabilities) until the rankings of the moves are stable and the evaluation difference between the top-ranked move and the other moves is either constant or increasing. That in itself indicates to me the need for a variable search depth rather than a fixed one.
|Feb-11-15|| ||Olavi: <EeEk & Nf8>
Thank you. The Soviet championships were included following the example of our host <Kinghunt>, who did the same with Kasparov.
Reykjavik 91 was an oversight, and yes it seems Biel 90 counts, and Wijk aan Zee 93 of course was KO.
|Feb-11-15|| ||Olavi: And so Waddinxveen 1979 will have to be included as well. There may be others.|
|Feb-11-15|| ||Kinghunt: Updated supertournament victories of Grischuk:
Petrosian Memorial 2014
Elisa FIDE GP 2008 (shared)
|Feb-11-15|| ||Kinghunt: I have decided not to change my definition of supertournaments. This means that Soviet/Russian championships will remain ineligible, despite the great strength of several of these events. However, I do hope to keep these events tallied so that if somebody else wants to count them, they can easily find the collected data.|
Updated supertournament wins of Garry Kasparov:
Brussels 1987 (shared)
Barcelona 1989 (shared)
Skellefteĺ 1989 (shared)
Tilburg 1989, 1991, 1997
Linares 1990, 1992, 1993, 1997, 1999, 2000, 2001, 2002 and 2005 Novgorod 1994, 1995, 1997
Las Palmas 1996
Wijk aan Zee 1999, 2000, 2001
Sarajevo 1999, 2000
Events excluded: Russian Superfinal 2004, Frunze 1981, Moscow 1988 (no international players)
|Feb-12-15|| ||Nf8: Some comments/corrections to other lists:
Note sure if Dortmund 04 should be counted; its format included knockout & non-classical games (it was identical with that of the 2002 "Candidates" event, from which Leko qualified to the match with Kramnik)
Novgorod 94 (shared with Kasparov) & Donner Memorial 96 are missing
Missing - Belgrade 91, Alekhine Memorial 92 (shared with Anand), Wijk aan Zee 92 (which you excluded - but Salov was a top-10 player; http://www.olimpbase.org/Elo/Elo199...)
On the other hand, Malmo 99 shouldn't be counted (insufficient strength)
|Feb-12-15|| ||Nf8: Also, you seem to have missed Bilbao Masters (2013) for Aronian (of the participants there, Mamedyarov was top-10 in previous moths of 2013).|
|Feb-16-15|| ||Kinghunt: <AylerKupp> I very much enjoy reading your posts, thank you for engaging in this discussion with me!|
I agree that the paper is not 100% convincing, and does not definitively settle this question. However, we seem to have different standards for what constitutes reasonable doubt. For example, you have concerns about how styles from different time periods may be evaluated differently by computers, while I do not see any reason to hypothesize such an idea (after all, aren't good moves good moves?) - but I also can't show that it isn't true.
I think all of your proposed improvements on the study are excellent and would indeed increase confidence in the results. At this point, I think we're simply discussing exactly how much confidence to place in these studies.
In my mind, this is implicitly a relative question - of computer-based studies, which have the best controls and should be trusted the most of these computer-based studies. Here, at the very least, I think we can agree that the work by Dr. Reagan is currently the best in the field, even if we disagree about how seriously even it should be taken.
|Feb-18-15|| ||sinusitis: So you are excluding the FIDE knockout world championships because some of the matches ended in rapid games, but you include Carlsen's recent blitz win over Naiditsch?|
Now what will you do bout the Zurich tournament in case Anand or Nakamura win after the rapids tomorrow?
Also, did you include the Dortmund tournament(s) in which a computer participated?
I think this is a great venture, but there are too many types of tournaments to make it easy to compare apples to apples.
The FIDE world championships should without a doubt be included as they were far more prestigious then any of these supertournaments, and in most cases results were due to the classical games (at least in the semis and finals).
|Feb-18-15|| ||sinusitis: Also please add Nakamura's Gibraltar win.|
|Feb-19-15|| ||Kinghunt: <sinusitis: So you are excluding the FIDE knockout world championships because some of the matches ended in rapid games, but you include Carlsen's recent blitz win over Naiditsch?>|
Two things. First, this is a difference between a single playoff and events determined almost entirely by playoffs - a strong player lacking in rapid/blitz skills could easily win a tournament like Baden-Baden, but would find it near impossible to win a world cup. Second, and more importantly, I am not including KO events primarily because they are better described as a series of matches, rather than a tournament. It doesn't matter if they're all decided in classical games, because advancing just means you won your match. Counting total match victories would be an excellent different project for somebody to take on, though!
<Now what will you do bout the Zurich tournament in case Anand or Nakamura win after the rapids tomorrow?>
Whoever the tournament organizers declare the ultimate winner will get the tournament victory here. If Nakamura ends up triumphing, while it might seem strange to give him credit, I do not think it would be any stranger than some of the Bilbao-style scoring results (for example, Biel Chess Festival (2012), where Carlsen finished on +4 and Wang Hao finished on +3, but because of the scoring rules in place, Wang Hao was awarded tournament victory).
<Also, did you include the Dortmund tournament(s) in which a computer participated?>
As far as I know, the only Dortmund event with a computer was in 2000, which I am crediting to Kramnik.
<Also please add Nakamura's Gibraltar win.>
Please read the definition of a supertournament I am using. Opens are not included, for a number of reasons, chief among them being that even when there are many strong players participating, they are infrequently paired. For example, only 4 of Nakamura's 10 games were against top 100 players. I think this supports my decision not to count even very strong opens as supertournaments.
|Feb-19-15|| ||Kinghunt: Career supertournament victories of Anatoly Karpov
Alekhine mem 1971
San Antonio 1972
Las Palmas 1977
Bad Kissingen 1980
Wijk aan Zee 1988
Reggio Emilia 1991
Dos Hermanas 1995
Events excluded: Tilburg 1994, Wijk aan Zee 1993 (KO-style events), Soviet ch 1976, Soviet ch 1983, Soviet ch 1988 (no international players)
(Huge thanks to <Olavi>, as well as <Eeek> and <Nf8>)
|Feb-19-15|| ||Kinghunt: Correction: Magnus Carlsen won Baden-baden 2015, not Zurich 2015, which is currently ongoing (thanks Nf8!)|
Career supertournament wins of Carlsen:
Wijk aan Zee 2008, 2010, 2013, 2015
Nanjing 2009, 2010
London 2009, 2010, 2012
Bazna 2010, 2011
Bilbao 2011, 2012
Tal Memorial 2011, 2012
Sinquefield Cup 2013
Gashimov Memorial 2014
Events excluded: Biel 2007 (insufficient strength)
|Feb-19-15|| ||Kinghunt: Tournament added: Bilbao 2013, won by Levon Aronian (thanks Nf8!)|
Career supertournament wins of Levon Aronian:
Wijk an Zee 2007, 2008, 2012
Bilbao 2009 , 2013
Tal Memorial 2010
Wijk an Zee 2014
Events excluded: Tal Memorial 2006, 2011 (lost both on tiebreak)
|Feb-19-15|| ||Kinghunt: <Nf8: Note sure if Dortmund 04 should be counted; its format included knockout & non-classical games (it was identical with that of the 2002 "Candidates" event, from which Leko qualified to the match with Kramnik)>|
Thanks for pointing that out, that's a really tough one to make a call on. However, I think I will keep it included. Anand was the only one at +2 after the "normal" rounds, plus he won the playoff, so however you look at it, he was clear winner, and I don't think adding "extra" games at the end takes too much away. While trying to remain as consistent as possible, I think this is close to a tossup.
|Feb-19-15|| ||sinusitis: Haha...right after all these tie break arguments we got a controversy today when the Zurich organizers changed the original tie break rule towards the end of the tournament and decided on a blitz armaggedon winner take all game.|
Only in the chess world do you get this! Kasparov would have walked right out of the playing hall at this kind of disrespect.
|Feb-24-15|| ||Olavi: Re Karpov's tournament wins: as <Nf8> pointed out Biel 1990 meets the criteria. It was played in July, and of the other players Andersson was world number 11 on July first, but he was nr. 9 in January and nr. 8 in July 1989.|
|Feb-27-15|| ||Lambda: This seems like a good place to post the results of a supertournament survey I've done, somewhat inspired by the one here, which looks back far further into history, at all the world champions and a few others, and also counts the total number of supertournaments competed in (between a player's first and last victory), to try to make figures over vastly varying rates of occurrence slightly more meaningful. (Several players start off winning them reliably then compete in many more with little success, I've extracted additional partial figures for them.)|
I'm using a somewhat different definition, it needs to have four top 10 players (or at least three other than the player currently under consideration), and otherwise just needs to be classical, all play all, and not a training tournament. A tie is counted as half a win (or a third for a three-way tie). Chessmetrics was used until 2004, because it puts all the information in one place and makes things easy. A few irregularities: Alekhine is counted as winning Mannheim 1914, which he was leading when WWI brought it to a premature end, Rubinstein is counted as top 10 for Berlin 1918 despite being technically unrated because he clearly was at that level, and Fischer is counted as competing in the candidates tourney he pulled out of. (Would be nice to add maybe Ivanchuk, Topalov and Aronian, but modern players are more work.)
player time-period years wins/played win-rate (wins/played per year)
Steinitz 1870-1882 12 <3>/4 <75%> 0.25/0.333
Tarrasch 1885-1907 22 <6.33>/14 <45%> 0.288/0.636
Lasker 1895-1924 29 <8.5>/10 <85%> 0.293/0.345
Rubinstein 1907-1912 5 <4.5>/9 <50%> 0.9/1.8
Rubinstein 1907-1925 18 <6>/19 <32%> 0.333/1.056
Capablanca 1911-1936 25 <5.5>/13 <42%> 0.22/0.52
Alekhine 1914-1934 20 <8.33>/15 <56%> 0.417/0.75
Bogoljubow 1922-1928 6 <4.33>/11 <39%> 0.722/1.833
Euwe 1934-1934 0 <0.33>/1 <33%> N/A
Keres 1937-1952 15 <4.5>/12 <38%> 0.3/0.8
Keres 1937-1962 25 <5.0>/25 <20%> 0.2/1
Reshevsky 1937-1937 0 <1.33>/3 <44%> N/A
Botvinnik 1935-1956 21 <7.5>/14 <54%> 0.357/0.666
Bronstein 1948-1955 7 <2.5>/6 <42%> 0.357/0.857
Smyslov 1949-1956 7 <3>/7 <43%> 0.429/1
Tal 1957-1961 4 <5>/6 <83%> 1.25/1.5
Tal 1957-1979 22 <5.5>/19 <29%> 0.25/0.864
Petrosian 1959-1976 17 <5>/18 <28%> 0.294/1.059
Spassky 1961-1983 22 <3.5>/17 <21%> 0.159/0.773
Fischer 1962-1970 8 <3>/5 <60%> 0.375/0.625
Korchnoi 1957-1973 16 <1.83>/15 <12%> 0.102/0.938
Karpov 1971-1986 15 <13.5>/23 <59%> 0.9/1.533
Karpov 1971-1989 18 <16.5>/30 <55%> 0.917/1.667
Karpov 1971-1996 25 <17.83>/41 <43%> 0.713/1.64
Kasparov 1982-2002 20 <25.5>/35 <73%> 1.275/1.75
Kasparov 1982-2005 23 <26>/38 <68%> 1.130/1.652
Anand 1991-2015 24 <15.83>/63 <25%> 0.660/2.625
Kramnik 1995-2014 19 <13.5>/55 <25%> 0.711/2.895
Carlsen 2008-2015 7 <12>/27 <44%> 1.171/3.857
< Earlier Kibitzing · PAGE 32 OF 32 ·
Times Chess Twitter Feed