|
< Earlier Kibitzing · PAGE 16 OF 16 ·
Later Kibitzing > |
Sep-22-09
 |
| whatthefat: Final question: are these the same conditions as were used in http://web.zone.ee/chessanalysis/su... ? |
 |
Sep-22-09
 |
| nimh: I understood that 12th move as a start point in the 19th century was an overkill, the theory wasn't that developed then. 8th move seemed to be right. |
 |
Sep-22-09
 |
| whatthefat: I'm asking all these questions because I find your results fascinating, and I really would urge you towards publication (in which case, reviewers may ask some of these questions). I would even be happy to write the paper (including some of my own views on the topic), with you as first author. |
 |
Sep-22-09
 |
| nimh: <Also, have you chosen a fixed time rather than a fixed ply depth to mimic the conditions a human is placed under during the game?> Fixed depth method assures a good middle game analysis, but endgames are covered quite poorly. By using time-based scrutiny, it will be more balanced and objective. And it indeed is more similar to the conditions humans play at. <Final question: are these the same conditions as were used in http://web.zone.ee/chessanalysis/su... ? > No |
 |
Sep-22-09
 |
| whatthefat: <nimh: I understood that 12th move as a start point in the 19th century was an overkill, the theory wasn't that developed then. 8th move seemed to be right.> Sounds fair enough. In my own experience it was hard to decide on where to begin the analysis, so I ended up just including the opening, which is of course a little unfair on players from earlier eras. |
 |
Sep-22-09
 |
| nimh: <I'm asking all these questions because I find your results fascinating, and I really would urge you towards publication (in which case, reviewers may ask some of these questions).> You mean I should write an introduction and send it with the pdf to places like chessbase and chessninja, and ask for them to be published? Would they be interested, you think? I think the forthcoming study will be more suitable, since all player-based analyses are intrinsically biased due to different approaches to chess. Although it's feasible to calibrate results by difficulties of positions, there is one thing that's quite hard to beat - subpar moves whose purpose is to make the opponent's life harder.
I assume in each period, there have been players of both type. <I would even be happy to write the paper (including some of my own views on the topic), with you as first author.>> It would be nice:) go on |
 |
Sep-22-09
 |
| whatthefat: <nimh>
I think it would be best to try to publish this in a scientific journal - from my little reading the area, I think ICGA is the specialist journal for this field: http://en.wikipedia.org/wiki/Intern... I'm not sure what your technical background is, but if you're looking at getting into any research-related field in the future, it's always a good thing to have papers published previously. From there, it would be possible to publicize the results more widely, e.g., chessbase and the like. I think the study you're commencing now would be ideal for publication, since it addresses the question of how playing strength has advanced over time from a totally new perspective. |
 |
Sep-23-09
 |
| nimh: <I think the study you're commencing now would be ideal for publication, since it addresses the question of how playing strength has advanced over time from a totally new perspective.> I also think so. It's better to wait until methods are advanced enough. |
 |
Sep-23-09
 |
| nimh: Two more points.
The maximum length of games is unlimited. Analyzing stops at the point where the number of pieces reaches below 10. Game selections must contain at least 25% of drawn games. |
 |
Sep-24-09
 |
| whatthefat: <nimh: Game selections must contain at least 25% of drawn games.> Why is that? Couldn't that bias the selection? For example, suppose drawn games are of a higher average quality than decisive games, and there were more decisive games in 1900 than 2000. By choosing a fixed fraction of drawn games, we are artificially improving the quality of the 1900 pool relative to the 2000 pool. Is this condition often imposed, or do nearly all the eras contain >25% draws? |
 |
Sep-25-09
 |
| nimh: You may be right, 25% seems to be too high. It isn't easy to find the correct percentages of draws. The database accompanying Fritz 11 I have contains only three games from 1860s, and chessgames advanced search is useless, allowing only single years. The purpose of the fixed ratio is to keep off too many decsive games whose quality is really lower than drawn games. In a longer perspective it is needless, but not in case of smaller game selections.
What do you suggest? |
 |
| Sep-25-09 |
| Ziggurat: ICGA would be the natural choice, because that was where Guid and Bratko published their study. I'd be happy to look at (preferably latter stage) drafts if you and <whatthefat> decide to write a paper. Your results are intriguing and I have experience of evaluating scientific papers from my "day job" (academic research). |
 |
Sep-25-09
 |
| whatthefat: <nimh>
I think we need to ensure that the set of analyzed games is an unbiased selection. Now, if we go sufficiently far back, we need to be careful because of selective reporting - a boring draw is less likely to be recorded than an exciting decisive game. I think the best way to get around this is to only use games from tournaments where all the games were recorded. Since you're analyzing games between players with chessmetrics ratings of 2600-2700, I would suggest analyzing all games between players with those ratings in a tournament (where we know that none of the games between these players are missing). If we follow this protocol consistently across all eras, it should help to eliminate bias. How many games were you looking to analyze? Also, is the process fully automated, or does it require your constant observation? <Ziggurat: I'd be happy to look at (preferably latter stage) drafts if you and <whatthefat> decide to write a paper. Your results are intriguing and I have experience of evaluating scientific papers from my "day job" (academic research).> Sure, it'd be great to have you on board. What field are you in by the way? I do sleep research, believe it or not. :) |
 |
Sep-25-09
 |
| nimh: <I think the best way to get around this is to only use games from tournaments where all the games were recorded. I would suggest analyzing all games between players with those ratings in a tournament (where we know that none of the games between these players are missing).> This is a bit problematic as to the mid-19 century tournaments and matches. I'd personally like to select games randomly from all clocked events, regardless how many games were preserved till today.
Being limited to one particular tournament per decade isn't a good idea. <How many games were you looking to analyze? Also, is the process fully automated, or does it require your constant observation?> I haven't decided yet how many I'll do. And it's based on the number of suitable positions, not games. It's semi-automated process. I manually type the results in a spreadsheet file that are recorded by Arena 2.0 in a txt file. That's why it takes so long! |
 |
Sep-25-09
 |
| nimh: <Your results are intriguing and I have experience of evaluating scientific papers from my "day job" (academic research).> By what criterions do you do it? You have seen my last paper. Would its quality be acceptable to you? |
 |
Sep-25-09
 |
| whatthefat: <nimh: By what criterions do you do it? You have seen my last paper. Would its quality be acceptable to you?> In its current form, no. There is a particular style of scientific writing and layout required for publication. Each journal also has its own requirements. |
 |
Sep-25-09
 |
| whatthefat: <nimh: This is a bit problematic as to the mid-19 century tournaments and matches. I'd personally like to select games randomly from all clocked events, regardless how many games were preserved till today. Being limited to one particular tournament per decade isn't a good idea.> You could use multiple tournaments from each decade, so long as you can get enough games that way. <It's semi-automated process. I manually type the results in a spreadsheet file that are recorded by Arena 2.0 in a txt file. That's why it takes so long!> Ah, I see. And since you are analyzing each position for a fixed amount of time, do you keep some sort of log of the games to ensure that the processor performance is not fluctuating (e.g., due to background tasks)? |
 |
Sep-25-09
 |
| nimh: No, it's a log of moves containing data on evaluations of multiple lines at different depths. Here's one example till depth 5
18. ..a5-a4
Best move (Rybka 3 w32): Bb7-c8
Not found in: 05:00
2 00:00 1.156 35.871 -0,07 Bb7c8
2 00:00 941 56.681 0,00 h7h6
2 00:00 628 37.827 +0,14 Ra8b8
2 00:00 597 35.960 +0,14 Bb7a6
2 00:00 877 52.826 +0,23 Re8e4
---
3 00:00 2.323 37.168 -0,03 Bf6e5
3 00:00 1.596 34.048 +0,06 h7h6
3 00:00 1.495 46.390 +0,06 Ra8b8
3 00:00 1.454 45.118 +0,06 Bb7a6
3 00:00 1.319 40.928 +0,24 Re8e4
---
4 00:00 3.091 33.317 -0,02 Ra8b8
4 00:00 3.359 36.206 -0,01 h7h6
4 00:00 3.866 35.664 +0,04 Bf6e5
4 00:00 2.935 38.043 +0,04 Bb7a6
4 00:00 2.610 41.760 +0,28 Re8e4
---
5 00:00 6.532 28.342 0,00 Ra8b8 Ne2d4
5 00:00 5.819 29.209 +0,01 Bf6e5 Ne2d4
5 00:00 6.128 28.523 +0,07 h7h6 Ne2d4
5 00:00 5.514 29.874 +0,07 Bb7a6 Ne2d4
5 00:00 4.683 33.770 +0,33 Re8e4 Ne2c3
---
I don't have big background tasks actually. |
 |
Sep-25-09
 |
| whatthefat: Okay, should be fine then. Just wanted to check that that was controlled for. |
 |
Oct-01-09
 |
| nimh: Although generally positions with fewer than 10 pieces are not included in datasets, I take all blunders into account.
Engines may disevaluate TB-like positions, but they hardly miss decisive mistakes. I've also abolished minimal draw percentage rule.
Events where data on time controls are not found, are regarded as having TC of 4 minutes per move. Each adjournment session adds one addtinonal hour to TC. This is subjectively decided as there exists no way to get any information on how many hours players spent or how intensively they worked upon it. While the complexity measure is unlimited and material is limited by 41.25, the upper boundary of difference value is arbitrarily set to 3.00. All players I analyse are quite skilled and very rarely miss such clear-cut moves. The bigger the difference, the more moves are needed to properly measure the rise in average error criterion. |
 |
Oct-02-09
 |
| whatthefat: All sounds reasonable to me. |
 |
Oct-09-09
 |
| nimh: Carlsen's TPR at Nanjing was 3002. Makes it worth including in my study to see what R3 thinks of it. If engine analyses are worth anything, it should indicate 2900 performance at least. |
 |
Oct-23-09
 |
| nimh: Carlsen's average raw error in Nanking was 0.063. It's too early to tell yet if positions in his games were more difficult than on average, since the difficulty parameters are not comparable with those in the previous study, but the time controls were tighter. |
 |
Oct-23-09
 |
| whatthefat: Am I right in saying that that is an extremely low average error? |
 |
Oct-23-09
 |
| nimh: It is, indeed. Without the woeful Wang game where he made altogether two blunders, it would have been even 0.51! Utterly sick! |
 |
 |
|
< Earlier Kibitzing · PAGE 16 OF 16 ·
Later Kibitzing > |