< Earlier Kibitzing · PAGE 21 OF 21 ·
|May-04-13|| ||KWRegan: <AylerKupp,badest> (1) For what I actually did, in a more controlled situation with samples of size 10,000, see the later parts of http://rjlipton.wordpress.com/2011/... (in "another life" I partner on a prominent Math/CS weblog---you may find other articles there of interest). I would be delighted to run more games with needed volunteer help and PC cores---to give an idea of what's involved, see what Guid and Bratko say in http://en.chessbase.com/home/TabId/... on why they stopped Rybka 3 at depth 10. My depth 13 is about (2.4)-cubed = 12x as much work per move (in MultiPV mode make that about 200x per move), and I've run many, many more moves...|
(2) You'd have to pay a lot for small data, and even so it would be under exhibition/simulation conditions, whereas the data I use is all from real competition. In comp-on-comp fixed-depth matches, Rybka 3's depth 13 comes out roughly par with Houdini 3 depth 17 and Stockfish 2.3.1 depth 19-20. Rybka groups the bottom 4 levels of its search into 1, and there are strong whispers that its reporting obfuscates, so many believe the depth is "really" 16 or 17. I wish the engine-rating services would conduct fixed-depth matches, though the loss of strength/relevance in endgames is a knock against the concept.
|May-04-13|| ||AylerKupp: <<badest> However, don't you think 13 is way too low?>|
Yes, that's my entire point; see the discussions in the previous few pages. The analyses to assess players' strength by comparing the moves that they make with the moves that a chess engine makes in the same position have been based, for the most part, on running the engines to a search depth = 13 and no deeper (except for the quiescent search that all engines do). The claim is that this is adequate but I am still skeptical. I'm willing to be convinced but I need to study the papers and other information that <KWRegan> has been kind enough to provide, but I haven't had the time to do so yet.
|May-04-13|| ||AylerKupp: <KWRegan> Thanks for the additional information. I will study as soon as I can but life unfortunately sometimes gets in the way.|
FWIW one concern I have about using fixed depth searches in general is that in actual games engines don't do that. Some positions require deeper searches and each engine makes their own assessment of how much time they can afford to spend (and hence how deep the search) on determining which move to make, using the remaining time as a guide and who knows what else. And, once you make time a criteria then the hardware that you are running the engine on becomes a factor since the more powerful the hardware, the deeper the engine can search in the same amount of time. So under game conditions an engine may be capable of making better moves than if fixed search depth is used.
Yes, there have been many rumors that Rybka does not correctly report the actual search depth it uses. Currently the feeling is that Rybka underreports the search depth but earlier versions of Rybka were considered to overreport the search depth. See http://mysite.verizon.net/vzesz4a6/.... I can only assume that, given that Rybka is a proprietary product, Vas Rajlich considers that information a valuable trade secret.
|May-04-13|| ||AylerKupp: <<badest> I think that the sample size can be very small, e.g. about a 1000 for 3% confidence interval.>|
Yes, I am aware that <if the samples are taken at random>, the sample size can be quite small and still yield statistically meaningful results. All the public opinion polls (at least the recent unbiased ones!) demonstrate that. That's what I was counting on to allow the much more time-consuming deeper searches to be performed in analyzing the data, provided that you don't demand that the confidence interval be very small. Certainty is costly!
|May-04-13|| ||nimh: A fixed amount of time per move is better, because the quality of analysis will be more balanced. In more simplified positions, the engine reaches a certain depth faster and the trustability of the engine evaluation as compared to humans will be lower than normally.|
The reason behind analysts picking such low depths or time per move, is pragmatic - they want results to get publicized rather quickly, it has nothing to do with a belief that engine at a particular depth would be good enough in al positions or absolutely perfect. It's pointless to criticize various researches on those grounds.
The most important thing is to have reliable methods for inferring chess skills from the average error or other similar measuers of accuracy.
The biggest flaw is to erroneously assume that chess skill = accuray of play; or that the effect of other factors are so insignificant that they don't need to be reckoned with. I'm glad to see that Regan has taken steps to that direction to increase the credibility of his conclusions.
|May-04-13|| ||badest: <The biggest flaw is to erroneously assume that chess skill = accuray of play;> I guess Tal is a good example, or counter-example :)|
|May-04-13|| ||nimh: <<The biggest flaw is to erroneously assume that chess skill = accuray of play;> I guess Tal is a good example, or counter-example :)>|
You can add Capablanca too, and we have examples for both ends of the spectrum. The former representing the case where a player 'overperforms' his accuracy and the latter the case where a player 'underperforms' the accuracy. :)
|May-04-13|| ||KWRegan: <nimh> The main reason for fixed depth (and using a single core thread) is scientific reproducibility. Timing is variable even on the same machine. Ultimately I plan to write a pipe routine that samples the numbers-of-legal-mmoves a few plies deep and adjusts the search depth accordingly.|
|May-04-13|| ||Tiggler: <al wazir: 1 isn't a prime.>|
Please list the factors of 1.
|May-05-13|| ||Ezzy: <Please list the factors of 1.>|
Please don't. This is 3 pages of completely 'off topic' already. Use a computer forum instead. -
|May-05-13|| ||AylerKupp: <Ezzy> Thanks again for being the "policeman". No excuse, but it is hard sometimes to stay on-topic when one of these discussions start.|
I have asked <chessgames.com> to create a page for these type of general chess engine discussions since there doesn't seem to be any. There are several pages for specific chess engines (Rybka, Houdini, Stockfish, Komodo, etc.) but these kinds of general chess engine discussion would seem to me to be equally off-topic in one of those specific chess engine pages also.
Although I still don't know the appropriate page to list the factors of 1.
|May-05-13|| ||AylerKupp: <All> It turns out that there already exists a general Computer Profile page, Computer, which would seem to me to be the logical place to post general computer-related information like we have been doing in this page recently. So, in case anyone cares, I will be posting any general computer-related information in that page and I would suggest that everyone else do so and avoid off-topic posts as much as possible.|
|May-05-13|| ||John Abraham: Wonderful tournament with exciting players, excellent action, and very instructive games. Thank you.|
|May-06-13|| ||Ezzy: FIDE Grand Prix Zug Tournament
<Topalov (8/11)> - +5 was pretty sensational. The 5 wins were exceptionally well played games, where Topalov completely outplayed his opponents in complex positions. He even entertained us with 3 consecutive exchange sacs (his trademark) in rounds 2, 3 and 4. This was a dominating performance, winning by 1.5 points and a spectacular 2924 TPR! That's 2 Grand Prix wins out of 2 and looks an odds on favourite to win a place in the next candidates.
<Nakamura (6.5/11)> - Never a dull moment watching Nakamura. He deserved his second place, only losing to the in form Topalov. He was no where near as clinical as Topalov though. In fact Nakamura's worst game of the tournament was the game against Morozevich, which by some miracle Nakamura won. Nakamura played with his usual fighting spirit. He made Karjakin suffer for over a hundred moves in round 1, he tortured Giri for 70 moves in round 2, and put Ponomariov under sustained pressure throughout the game in round 8, but only achieved 3 draws against them. All in all a good performance by Nakamura without setting the world alight,
<Ponomariov (6/11)> - Made a great start to the tournament with 2 wins and 5 draws, including a nice win against Caruana. Ponomariov was also the only player to push Topalov into a worst endgame which he eventually saved. Ponomariov eventually faded in the last 4 rounds. He was under severe pressure from Nakamura, but got a draw, then lost to Radjabov, and finished with 2 uneventful draws. Seemed to run out of steam at the end, but not a bad tournament.
<Caruana (6/11)> - More of a rollercoaster ride for young Fabio, with 3 wins and 2 losses. 3 nice wins against Radjabov, Kasimjanov and Kamsky, but outplayed against Ponomariov and Topalov. Round 6 saw the crazy game against Karjakin which neither player seemed to want to win. First Caruana had a winning position, then Karjakin got a winning position, but in the end a draw was probably the best result as neither deserved to win. Still, joint 3rd is a nice performance and lots to learn from the tournament.
<Morozevich (5.5/11)> - Got off to a flying start with 4/6. Disaster then struck with 3 losses in a row, not made any easier for Moro when he had a completely won position against Nakamura in round 9, he didn't find the winning moves, and finally blundered and lost. Morozevich was lost for words at the press conference. He eased his shattered confidence with a win in round 10 against Radjabov, and a draw in the last round. A tale of two halves. Impressive in the first, disaster in the second.
<Kamsky (5.5/11)> - How to describe Kamsky's tournament? The general theme was unstable. He was falling into sequences of draw, loss, draw, draw, win, loss, win, draw, loss, win, draw. He even refused a three-fold repetition in the middle game to continue pursuing the full point, and lost. 2 of his draws he could have lost, especially against Topalov who missed a win. So, not a terrible tournament, but his play wasn't what he probaly desired.
<Karjakin (5/11)> - Was unbeaten after the first 9 rounds, which included a great win over Mamedjarov, but of his 8 draws, some he stood better, some he stood worse, which was an indication that his play was inconsistent. This proved to be the case when he had a miserable finish, losing his last 2 games. After a great start in the Grand Prix series, this will be a big disappointment to Sergey. But not yet a disaster.
|May-06-13|| ||Ezzy: <Giri (5/11)> - (+0 -1 =10) Giri didn't win a game and never looked like doing so. Made a couple of errors against Morozevich and lost. Of his 10 draws, 7 were fair draws, and 3 he was definately worse and under pressure. BUT a lot to be gained from the experience of playing in this kind of company. Very respectable from the youngster. My Giri highlight was his game against Caruana, where Giri spent 5 minutes for his first 30 moves! then 70 minutes on his 31st move!|
<Leko (5/11)> -(+0 -1 =10) Dare I say it - Drawko. There's definitely some mojo missing in Leko's games. Another poor result in a Grand Prix tournament which means he has already lost any chance of a candidates place. Back to the drawing board for Peter (excuse the pun)
<Mamedyarov (4.5/11)> - Not a good score for Shak, but I think he played better than his score suggests. He was unfortunate to fall into Karjakins home prep with the positional sacrifice 16 Nxh6+! He played a game to forget against Nakamura, his only bad game, which was really bad. Other than that his 9 draws were well fought, and he even had a few chances in those draws. Perhaps nothing too much to be concerned about, except it has damaged his Grand Prix points tally Big time! Just needs to recover his killer instinct.
<Kasimdzhanov (4.5/11)> - The Grand Prix series is showing that Kasimjanov is not quite up to the level of some of these players. He has had trouble in games where the position was very complex. He needs to improve this to beat players at this level.
<Radjabov (4.5/11)> - Everybody is wondering - "What has happened to Teimor" - His last 2 tournaments has resulted in the loss of 45 elo points, and he has gone into freefall from number 4 in the world to number 15 in the world. This usually only happens to players like Ivanchuk or Morozevich, BUT young Teimor? Dare we blame marriage. Perhaps not, it's not doing Topalov any harm. The thrills and spills of top level chess. Every player goes through a 'bad patch' now and again. This is Teimor's 'bad patch. But he'll be back.
|May-07-13|| ||AylerKupp: <Ezzy> Hey! Who said that you were allowed to make on-topic posts?|
|May-07-13|| ||pbercker: @ <aylerkupp>
Thanks for the heads-up on the <computer> forum as a more appropriate place to continue the discussion ... I'll post a bit more there instead on here ...
|May-07-13|| ||pbercker: @ <everybody> ...
I was about to say that it's nice that this <KWregan> fellow is joining the discussion since he seems to know what he's talking about ... and then I just realized who he is! .... It's of course professor Kennneth Regan whose papers I've been pouring over these last few weeks in spite of having to dodge the various mathematics I no longer understand (my calculus days being long long behind me!) ...
(Prof. Regan, I've also enjoyed some of your other non-chess musings! I used to teach philosophy of religion among other things, and a bit of philosophy of science as well as a bit of logic ...)
|May-07-13|| ||pbercker: @ <KWRegan>
I'm curious ... what kind of volunteer help do you need?
...btw, I'll start posting in the <computer> forum from now on so as not to clutter this page ....
|May-07-13|| ||AgentRgent: <pbercker: I just realized who he is! .... It's of course professor Kennneth Regan> |
;) Well, I actually told you that a while back e.g.:
<Apr-12-13 AgentRgent: <pbercker: http://www.cse.buffalo.edu/~regan/p>... Interestingly enough Ken is a kibitzer here at CG.com User: KWRegan.>
|May-07-13|| ||Ezzy: <AylerKupp: <Ezzy> Hey! Who said that you were allowed to make on-topic posts?>|
Please don't 'Blow the Whistle.' :-)
|May-07-13|| ||perfidious: Ah got mah hands on that there whistle. Yer all goin down now!!!! Heh heh heh!!!!!|
|May-07-13|| ||nimh: <An in-depth study of games between 2600 players in all years shows about 15-20 Elo quality reduction from the 1980's to the 2000's>|
What were the average time controls for games from 80s and 2000s in your dataset? The difference in the elo should be bigger that indicated by your data, because of the rise in the level of play since eighties.
|May-07-13|| ||pbercker: @ <agentRgent>
I think it didn't register at the time because I was not fully clued in who professor Regan was; I don't think I had yet struggled through his papers yet!
|May-15-13|| ||Honza Cervenka: <KWRegan> I think that there are plenty of apparent methodological problems with whole quantitative approach based on analysis by Rybka 3 to fixed reported depth 13 or 14 ply. It is heavily biased towards today’s players who are using exactly this program or other engines for opening preparation which goes quite often to the 30th move, if not deeper, and so it is no wonder that their indicators of quality, which are based on differences in evaluations between Rybka’s top choice and the move played in the game, are on average better than the same indicators of those guys, who were forced to use their heads and old books and journals only. Of course, better and deeper preparation or knowledge of opening theory spares a lot of time and energy for following moves, which is an important factor favoring current players too.|
It is also a common knowledge that despite of its impressive strength in practical play, Rybka’s evaluations, especially if calculating in short horizon, can be often utterly misleading. Any decent correspondence chess player knows quite well that to let the engine to play alone with no human intervention in a CC tournament is a sure recipe for disaster, no matter how deeply the engine is allowed to calculate. I can provide you also a lot of practical examples from real games where Rybka running for many hours and reaching the depth much over 20 plies gives wrong evaluation and bad continuation, which can be easily recognized by a human chess player. For example, quite well-known positional blunder made by Bobby Fischer in the second game of his candidate match against Mark Taimanov (see Fischer vs Taimanov, 1971), i.e. 50.c5??, which throws away the decisive advantage instead of forced win after 50.Ra6! Kc7 51.c5 bxc5+ 52.bxc5 Ne8 53.Rg6! (discovery of Yuri Balashov), is something, what passes by completely undetected by Rybka or Houdini, because they don’t understand that after this move black can build a simple blockade with objectively drawn position. Of course, to find Balashov’s fairly simple forcing line leading to technically easily won ending, they are unable as well. It is also funny how Rybka evaluates the last move 41.Qb7+ in our game against Arno Nickel (The World vs A Nickel, 2006), which forces transition into easily won Rook ending, and what it recommends to play instead of it. Of course, human assistance to engine and more qualitative approach instead of mechanical grinding of centipawns can be quite helpful here.
In a paper (http://www.cse.buffalo.edu/~regan/p...) written by you, Bartek Macieja and Guy Haworth, there are also some quite interesting details, which seem to be a bit in contradiction with your rather categorical conclusions. For example, Intrinsic Performance Rating calculated by you for 1979 Montreal tournament (mere 15th category of FIDE with average elo 2622 – even Czech championship could be stronger today) was 2588, that is significantly higher than IPR of 18th cat. Linares 1993 (elo=2676, IPR=2522) or Linares 1994 (elo=2685, IPR=2517). Another quite interesting point apparent from figure 3 was the fact that tournaments in period 1971-1984 and in lesser extent tournaments from 1985-1999 had far lower values of average error in later phases of play after move forty (adjournments apparently mattered) despite the lower average ratings of tournaments then played in comparison to 2000-2009 and 2010+. So much for superiority of current players over their predecessors from 1970s and 1980s.:-D
But the main problem with whole argument of your paper in my view is that nothing there proves any causality between eventual rise of quality of play measured by IPR and the rise of elo ratings, which are constructed primarily for indication of relative strength of a player in comparison to others at the moment and his/her momentary position in the world ranking. Ratings are produced by results of games, not by quality of games. Why and how should any growth of average IPR lead to growth of elo ratings, which are calculated only and only from results of games, regardless their quality? Correlation does not imply causation per se. The simple basic question is, whether any correlation between IPR or any other characteristic quantifying quality of play and growth of ratings is anything but mere coincidence (like let's say a correlation between growth of average height of population and share of university graduates within the population over the time)? Aren't we looking at just an example of cum hoc ergo propter hoc logical fallacy here?
< Earlier Kibitzing · PAGE 21 OF 21 ·