< Earlier Kibitzing · PAGE 139 OF 139 ·
Later Kibitzing> 
Nov1618
  AylerKupp: <<centralfiles? As for engine favorite 20.Bd5 ... This is the line i was talking about earlier when I said the computer can't see deep enough.> True, at least in any reasonable amount of time; for example, in the remainder of my lifetime. In order for the computer to see deep enough you have to perform some forward sliding to try to eliminate or at least defer the impact of the horizon effect. And this I haven't attempted to do since all I offered to do is to run some 3engine analyses on <kwid>'s main line to see how the moves in his analysis compare with the moves that the 3engines considered best. 

Nov1618
  AylerKupp: <<centralfiles> 20...Kd8! already we have a move the engines would not look at as first choice.> You list a few moves with some subjective commentary but you never indicate what engine(s) you were using and what was the search ply at which the engine reached the evaluations you listed, although I suspect you were using some version of Stockfish. Was this just an oversight? The engine use and the search depth reach are important in determining how much confidence you have in an engine's evaluation. 

Nov1618
  AylerKupp: <<centralfiles> So in conclusion what does the engine say about the 20.Bd5 line? DEAD EQUAL!! > Remember that an evaluation of <Equal> (which I consider to be [ ± 0.49]) means, it means that both sides have <equal chances>, not that the position will necessarily end up in a draw, although that's probably the most likely result. But you have just finished saying that the engines can't see deep enough in these positions, so how can you conclude that the position is therefore a draw <in all possible lines> if you don't really have confidence in the engine's evaluation? And I don't know about you, but to me that's what <conclusively> means. And that's the claim being made for 19...Rf6. <But how long would it take for the engine to see this at move 20.???? Hard to answer definitively of course, but seeing as so many of the moves were far down the engines list of its top choices and were initially evaluated very poorly by the engine, it would take several days at the very least and probably much longer for an engine running at move 20 to see this far.> You have absolutely no valid basis that I can see for making that statement. In a complex position such as this one the engine(s) ranking of the moves can change from ply to ply. So it can take an engine only one ply to change its ranking of the moves completely. How long it takes it to complete a oneply analysis depends on the search depth, since the time to complete one ply of analysis increases exponentially according to the search depth. But, if you want to find out a definitive answer, see one of my later posts. Spoiler alert: Less than 1 hr 15 mins on a relatively slow computer using Stockfish 9. <The power of hindsight <kwid's provided lines> and sliding is far greater than days or even weeks of computer time.> Agreed. That's something that I haven't attempted to do because I wasn't asked to do it. But that's what I offered to do in Team White vs Team Black, 2017 (kibitz #3939), to have <kwid> (or you, or anyone else) start and analysis with 20.Bd5 and I would do something similar to what I did after 20.Rb1, run a multiengine analysis overnight to see whether the move that the engines suggest as best is the same move that the human analysis suggests is best. If it is, then I would proceed to have the engines analyze the next move in the analysis. If it isn't, then either consider doing a new analysis starting with the move that the engines suggest is best or do some forward sliding to validate the engines' analyses and eliminate/reduce the impact of the horizon effect. 

Nov1618
  AylerKupp: <<centralfiles> As far as engine tournaments go: In this kind of situation where we are primarily trying to figure out if a certain move draws with best play it's hard to see any value to them at all.> The only value I see is if we can't reach a definitive conclusion about the game's result after a certain move, something that I believe we can't reach in this game. Then possibly the next best thing is to determine which move gives the statistically best practical chance to achieve the desired result. For that I don't know how many games would be needed to achieve a statistically significant result. If anyone can help me calculate that, I would be greatly appreciative. I also don't know what the time control should be. I want to be able to start each game in the early evening and finish NLT midmorning. I was able to accomplish that by playing 2 games at a time control of 40 moves in 2 hours and 20 moves every hour afterwards. I can double the time to 40 moves in 4 hours and 20 moves every 2 hours afterwards. This would restrict me to running only one game overnight and, of course, double the time needed to run the number of games needed. But, given that we had 2 days to consider the move to make, will the results acquired by a 40 moves in 4 hours, etc. be considered credible in accepting the results compared to spending 2 days on each move? That's a subjective question. But I'm not willing to either have the games take longer or overnight or conduct any engine vs. engine games if the results of that set of games are not going to be considered conclusive. <<They seem most useful in complex unclear positions with many good ideas for both sides where we are trying to evaluate who has the better practical chances, not if one side is objectively "won". Judging from your last post you probably mostly agree here> Yes, I agree, and I would add "not if one side is objectively won, lost, or if the game is drawn". Because I definitely agree that many of the positions arising in this game, either the actual positions or positions resulting from analysis of alternative lines, are definitely complex and unclear. And that's one reason why it's so difficult to reach conclusive results with straightforward analysis, either humanbased or computerbased. 

Nov1618
  AylerKupp: <<central files> Concerning Hamppe vs Meitner, 1872 I did an extensive analysis of it starting at Hamppe vs Meitner, 1872 (kibitz #216) and Hamppe vs Meitner, 1872 (kibitz #250) in response to a request for an opinion from <chessgames.com> as to the desirability of using this game as the basis for the first Thematic Challenge. And I think they learned their lesson, they've never asked me for a comment ever again. :) And, given that you're talking abut the Immortal Draw and we're considering whether 19...Rf6 leads to a draw, I think that your mention of that game meets the 6degree of separation test as far as being ontopic. :) Maybe that should be an official criteria for <chessgames.com> to determine whether a post is ontopic or not? As far as Stockfish's aggressive search tree pruning, I'm philosophical about it. In life you seldom get something for nothing and chess engines are not exempt from that. Given that as a result of its aggressive search tree pruning Stockfish reaches substantially deeper search depths than, say, Houdini and Komodo. Yet Stockfish's headtohead game results are not substantially better than Houdini's and Komodo's. So I would be forced to conclude 2 things: (1) Both approaches are equally valid for practical purposes, Stockfish's aggressive search tree pruning and its resulting ability to reach deeper search depths vs. Houdini's and Komodo's less aggressive search tree pruning and their resulting inability to achieve equivalent search depths. The two seem to balance with respect to reaching their high ratings. (2) You need to let Stockfish reach a deeper search depths than either Houdini or Komodo in order to have similar confidence in its evaluations. That's why I alternatively laugh and groan when I see a site like <chess24.com> run a Stockfish analysis to d=24 and then many people (including some of the commentators) attach great (or for that matter, any) significance to its evaluations. 

Nov1618
  AylerKupp: <<centralfiles> A propos item (2) in my previous post and my earlier spoiler alert, here are the results of a Stockfish analysis in my even older and slower computer (32bits!) using only 2 threads and a 512 MB hash table after 20.Bd5 at d=36: 1. [+0.90]: 20...Kd8 21.Ne4 Rxf4 22.Nexd6 Bg4 23.Qd3 Ne5 24.Qxh7 Kd7 25.Bxb7 Raf8 26.hxg4 Nxg4 27.g3 Nf2+ 28.Kg2 Qg5 29.Nc8 Rxc8 30.Bxc8+ Kxc8 31.Qg8+ Kd7 32.Nxa7 Bxa7 33.Qa8 Ra4 34.Qb7+ Kd8 35.Rad1+ Nxd1 36.Rxd1+ Bd4 37.Rxd4+ Rxd4 38.Qb6+ Ke8 39.Qe6+ Kd8 40.cxd4 e2 41.Qxe2 Qd5+ 42.Kh3 Qxd4 43.Qe6 Kc7 44.a3 Qc3 45.Qa6 Qd4 46.a4 Qe4 47.Qb5 Qe6+ 48.g4 2. [+1.10]: 20...a6 21.Nd4 Kc7 22.Qb3 Nxd4 23.cxd4 Bxd4 24.Rac1+ Kb8 25.Ne4 Ka7 26.Nxf6 gxf6 27.Rc4 Bc5 28.Re4 Qd8 29.R4xe3 Bxe3 30.Qxe3+ Qb6 31.Qe7 Qd4 32.Qxd6 Bf5 33.Re7 Rb8 34.Qc6 Qb6 35.Qc3 Qd6 36.Qe3+ Qb6 37.Qg1 Qxg1+ 38.Kxg1 Kb6 39.g4 Kc5 40.Bxb7 Bd3 41.Kf2 a5 42.Rd7 Bc4 43.a3 h6 44.f5 a4 45.Ke3 Bb5 46.Rh7 3. [+1.28]: 20...Rxf4 21.Be6+ Kd8 22.Qd5 h6 23.Nf7+ Rxf7 24.Bxf7 Bc5 25.Rad1 Qe5 26.Qf3 Qf5 27.Qxf5 Bxf5 28.Nd4 Bxd4 29.cxd4 Be4 30.d5 Ne5 31.Rxe3 Nxf7 32.Rxe4 Ne5 33.Kh2 b6 34.Rc1 Rc8 35.Rxc8+ Kxc8 36.Re3 Kb7 37.Rb3 a6 38.Kg3 b5 39.Kf4 g6 40.Ke4 Kb6 41.Rc3 Nc4 42.Rc1 Kc5 43.Rc3 4. [+2.56]: 20...Ke8 21.Qf3 Bd7 22.Rad1 Ne5 23.Nxd6+ Qxd6 24.Qh5+ g6 25.Qxh7 Qe7 26.fxe5 Qxh7 27.Nxh7 Rf2 28.Nf6+ Ke7 29.Bxb7 Rd8 30.Nd5+ Ke8 31.Nxb6 axb6 32.Rxe3 Bf5 33.Bd5 Rf4 34.Bf7+ Ke7 35.Rxd8 Kxd8 36.Bb3 Ke7 37.Kh2 Rf2 38.Kg3 Rb2 39.h4 b5 40.Kf3 Be6 41.g4 Bxb3 42.axb3 Rxb3 43.Kf4 Ke6 44.Rd3 b4 45.Rd6+ Ke7 46.cxb4 5. [+3.17]: 20...h6 21.Ne4 Ke8 22.Nxf6+ Qxf6 23.Bxc6+ bxc6 24.Nxd6+ Kf8 25.Nc4 Kg8 26.Nxe3 Be6 27.Qd6 Rf8 28.Qxc6 Kh8 29.Nc4 Bxc4 30.Qxc4 Qxf4 31.Qxf4 Rxf4 32.Rad1 Rc4 33.Rd3 Ra4 34.Re8+ Kh7 35.Re7 Rxa2 36.Rdd7 Kg6 37.Rxg7+ Kf6 38.Rgf7+ Ke6 39.Rde7+ Kd6 40.Re1 Kc5 41.Rf6 Kc4 42.Rxh6 Kxc3 43.Rf1 Rf2 44.Rc6+ Kb4 45.Rxf2 Bxf2 So it <IS> possible for at least Stockfish to consider 20...Kd8 as its best response to 20.Bd5 but admittedly it took it about 1 hr 8 mins to reach that conclusion in my archaic computer. And, since this is a complex position, the move rankings changed from ply to ply: Ply PV=1 PV=2 PV=3 PV=4 PV=5
20 ...Rxf4 ...Kd8 ...Ke8 ...a6 ...Rb8
21 ...Rxf4 ...a6 ...Kd8 ...Rb8 ...Ke8
22 ...Rxf4 ...a6 ...Kd8 ...Ke8 ...Rb8
23 ...Rxf4 ...a6 ...Kd8 ...g6 ...Ke8
24 ...Rxf4 ...a6 ...Kd8 ...g6 ...Rb8
25 ...Rxf4 ...a6 ...Kd8 ...g6 ...Ke8
26 ...Rxf4 ...a6 ...Kd8 ...g6 ...Ke8
27 ...a6 ...Rxf4 ...Kd8 ...g6 ...Ke8
28 ...Rxf4 ...a6 ...Kd8 ...g6 ...Ke8
29 ...a6 ...Rxf4 ...Kd8 ...g6 ...Rb8
30 ...a6 ...Rxf4 ...Kd8 ...g6 ...Ke8
31 ...a6 ...Rxf4 ...Kd8 ...g6 ...Ke8
32 ...a6 ...Rxf4 ...Kd8 ....Ke8 ...h6
33 ...a6 ...Rxf4 ...Kd8 ....Ke8 ...Rb8
34 ...Rxf4 ...a6 ...Kd8 ....Ke8 ...Rb8
35 ...Rxf4 ...a6 ...Kd8 ....Ke8 ...h6
36 ...Kd8 ...a6 ...Rxf4 ....Ke8 ...h6
So, when stopping an engine analysis at an arbitrary point in time (I stop my analyses when I run out of either time or patience, or both), it's important to look at the history of move rankings to see if they have stabilized. In this case they clearly haven't, and since I ran out of both time and patience, I stopped the analysis when I got the results I wanted to show. :) But, at least to me, given the data above, Stockfish 9's move rankings are not (yet) reliable at d=36. 

Nov1618
  AylerKupp: <<kwid> But we leave now in the "Bean Counter" times grinding out variations by the numbers with no room for art anymore.> Sad but true. It all depends on the relative value that you place on art and accuracy. As much as we usually admire a spectacular attack involving several sacrifices, its enjoyment for some of us is diluted if it turns out that the defender did not play the best responses. Oh well, Sic Transit Gloria Mundi. And, when I first read your post, I thought you said "with no room for <fart> anymore". I'm glad you didn't say that or I would have been truly depressed, not to mention constipated. :) 

Nov1618
  AylerKupp: <<kwid> <Summary of White's choices after 24...Rxb5>
After 24...Rxb5 we reach this position:
click for larger viewNow that my previous "distractions" are out of the way, here is a summary of 3 engines' evaluations, sorted in order of descending RatingsWeighted Average (RWAvg) since it is White's move. Again, a relatively rare agreement with regards to the top 5 moves but, by chance, not 100% agreement as to their move rankings. Clearly White must recapture so only 25.Rxb5 or 25.Bxb5 should be given serious consideration, and the engines assess that White's <and> Black's best approach after 25.Bxg5 is to seek a draw by repetition. But I asked the engines to display their "top" 5 moves and they obediently complied. White's Houdini 6 Komodo 12.1 Stockfish 9
Move d=27 d=30 d=44 <Avg> <RWAvg> <TrueRank>
      
25.Rxb5 [+0.80] [+0.68] [+0.84] <[+0.77]> <[+0.77]> <1>
25.Bxb5 [0.00] [0.00] [0.00] <[0.00]> <[0.00]> <2>
25.Nxd6 [4.90] [4.96] [6.77] <[5.54]> <[5.56]> <3>
25.Ra1 [5.42] [5.19] [7.01] <[5.87]> <[5.89]> <3>
25.Bb3 [5.71] [5.26] [7.01] <[5.99]> <[6.00]> <3> True Rank: 1 = [ 25.Rxb5 ]; 2 = [ 25.Bxb5 ]; 3 = [ 25.Nxd6, 25.Ra1, 25.Bb3 ] And here is a summary of how the 3 engines ranked their top 5 moves, without regard for the value of the evaluation. White's Houdini 6 Komodo 12.1 Stockfish 9
Move d=27 d=30 d=44 <AvgRank> <TrueRank>
     
25.Rxb5 1 1 1 <1.0> <1>
25.Bxb5 2 2 2 <2.0> <2>
25.Nxd6 3 3 3 <3.0> <3>
25.Ra1 4 4 4 <4.0> <4>
25.Bb3 5 5 4 <4.7> <4> True Rank: 1 = [ 25.Rxb5 ]; 2 = [ 25.Bxb5 ]; 3 = [ 25.Nxd6 ]; 4 = [ 25.Ra1, 25.Bb3 ] 

Nov1618
  AylerKupp: <kwid> Here is the current state of the comparison through 24.Ne4 between the moves in your initial analysis in Team White vs Team Black, 2017 (kibitz #3860). Analysis PV=1 PV=2 PV=3 PV=4 PV=5
      <20.Rb1> 20.Bd5 20.Qd3 20.f5 <20.Rb1> 20.Qf3 <20...Kd8> <20...Kd8> 20...h6 20...g6 20...h6 20...Bc5 <21.Qd5> <21.Qd5> 21.Qh5 21.Qf3 21.Bd5 21.Nxh7 <21...Bd7> <21...Bd7> 21...a6 21...g6 21...Rf5 21...h6 <22.Red1> <22.Red1> 22.Nf7+ 22.Ne4 22.Qg8+ 22.Nxh7 <22...g6> <22...g6> 22...Rxf4 22...Kc8 22...Na5 22...Rf5 <23.Ne4> <23.Ne4> 23.Nxd6 23.a4 23.Be2 23.a3 <23...Rf5> <23...Rf5> 23...Be6 23...Rxf4 23...Bc5 23...Re6 <24.Qd3> <24.Qd3> 24.Qg8+ 24.Qxd6 24.Nbxd6 24.Qxf5 <24...Rxb5> <24...Rxb5> 24...Rh5 24...Ke8 24...Rxf4 24...d5 <25.Rxb5> <25.Rxb5> 25.Bxb5 25.Nxd6 25.Ra1 25.Bb3 <25...Kc7> (TBD) 

Nov1618
  AylerKupp: <kwid>, <centralfiles> So now I am at a crossroads as to what to do. Here are some options as I see them: (1) Continue analyzing the moves in the main line posted Oct0818 with Black's response to 25.Rxb5 through Black's response to 27.Nxd6. Afterwards continue the analysis of White's response to Black's next move in this line, 27...Rf8. (2) Continue as in (1) above but switch to analyzing White's response to 27...Rf8, the first move in the analysis posted Oct1918 after 27.Nxd6 that diverges from the initial analysis posted Oct0818. (3) Analyze White's responses to 27...Rf8 and Black's responses to 28.Be2, but then switch to analyzing White's responses to 28...a6, the first move in the analysis posted on Nov1218 that deviates from the second analysis posted on Oct1918. (4) Wait until <kwid> or someone else develops an analysis main line after 20.Bd5 and begin analyzing Black's responses to that. (5) Stop, recognizing and accepting that this approach will not achieve the goal of conclusively indicating that 19...Rf6 leads to a draw for Black. (6) Other ???
Think about it. I have at least a week since I will be out of town for the Thanksgiving holiday and will not be working on this while I'm away. 

Nov1618
  diceman: <AylerKupp:
Ply PV=1 PV=2 PV=3 PV=4 PV=5
20 ...Rxf4 ...Kd8 ...Ke8 ...a6 ...Rb8
21 ...Rxf4 ...a6 ...Kd8 ...Rb8 ...Ke8
22 ...Rxf4 ...a6 ...Kd8 ...Ke8 ...Rb8
23 ...Rxf4 ...a6 ...Kd8 ...g6 ...Ke8
24 ...Rxf4 ...a6 ...Kd8 ...g6 ...Rb8
25 ...Rxf4 ...a6 ...Kd8 ...g6 ...Ke8
26 ...Rxf4 ...a6 ...Kd8 ...g6 ...Ke8
27 ...a6 ...Rxf4 ...Kd8 ...g6 ...Ke8
28 ...Rxf4 ...a6 ...Kd8 ...g6 ...Ke8
29 ...a6 ...Rxf4 ...Kd8 ...g6 ...Rb8
30 ...a6 ...Rxf4 ...Kd8 ...g6 ...Ke8
31 ...a6 ...Rxf4 ...Kd8 ...g6 ...Ke8
32 ...a6 ...Rxf4 ...Kd8 ....Ke8 ...h6
33 ...a6 ...Rxf4 ...Kd8 ....Ke8 ...Rb8
34 ...Rxf4 ...a6 ...Kd8 ....Ke8 ...Rb8
35 ...Rxf4 ...a6 ...Kd8 ....Ke8 ...h6
36 ...Kd8 ...a6 ...Rxf4 ....Ke8 ...h6 > Is it part of your software (or maybe custom programming by you?) that you can track PV moves vs ply depth? 

Nov1618
  kwid: In the Rb1 line it may well be better to back solve from the draw position shown below to see if the engines can improve this line. <19. Nb5 Rf6 20. Rb1 Kd8 21.Qd5 Bd7 22. Red1 g6 23. Ne4 Rf5 24. Qd3 Rxb5 25. Rxb5 Kc7> 26. Qxd6+ Qxd6 27. Nxd6 Rf8 28. Be2 a6 29. Rb2 Rxf4 30. Nxb7 Rf2 31. Nd6 Ne5 32. Ne4 Rf4 33. Ng5
h6 34. Nf3 Nxf3 35. Bxf3 Bb5 36. Rb4 Rc4 37. a4 Rxb4 38. cxb4 Bxa4 39. Rc1+ Kd6
40. Be2 Bb5 41. Bxb5 axb5 42. g3 Bd4 43. Kg2 Bb2 44. Rc8 Ba3 45. Rd8+ Ke6 46.
Re8+ Kf6 47. Rxe3 Bxb4 48. Rf3+ Ke7 49. Rb3 Bd2 50. Rxb5 h5 51. Kf3 Kf6 52.
Rb6+ Kf7 53. Ke4 Be1 54. g4 hxg4 55. hxg4
click for larger view 

Nov1718
  kwid: <AylerKupp:> To close out the discussion if Rf6 holds I searched in support of the white side and could not come up with any winning lines.
For example:
19. Nb5 Rf6 20. Rb1 Kd8 21. Qd5 Bd7 22. Red1 g6 23. Ne4 Rf5 24. Qd3 Rxb5 25. Rxb5 Kc7 26. Qxd6+ Qxd6 27. Nxd6 Rf8 28. Be2 a6 29. Rb2 Rxf4 30. Nxb7 Rf2 31. Nd6 Ne5 32. Ne4 Rf4 33. Ng5 Rf2 (33... h6 34. Nf3 Nxf3 35. Bxf3 Bb5 36. Rb4 Rc4 37. a4 Rxb4 38. cxb4 Bxa4
39. Rc1+ Kd6 40. Be2 Bb5 41. Bxb5 axb5 42. g3 Bd4 43. Kg2 Bb2 44. Rc8 Ba3 45. Rd8+ Ke6 46. Re8+ Kf6 47. Rxe3 Bxb4 48. Rf3+ Ke7 49. Rb3 Bd2 50. Rxb5 )
click for larger view34. Rf1 h5 35. Ne4 Rxf1+ 36. Bxf1 Bf5 37. Rb4 a5 38. Rb5 Nd7 39. Ng5 Nc5 40. Be2 h4 41. c4 a4 42. g3 hxg3 43. Kg2 Bd7 44. Rb4 Na6 45. Rb1 Ba5 46. Kxg3 Nb4 47. a3 Nc2 48. Rb2 Nxa3 49. Ne4 Kc6 50. Ra2 Bb4 51. Kf3 Bxh3 52. Kxe3 Bf5 53. Bd3 Bf8 54. Kf4 Be7 55. Be2 Bf8
click for larger view 

Nov1718
  centralfiles: <AK> I am indeed surprised your old machine managed to get that far down the line in only a little over an hourthough when i said at least days i meant until it sees it closer to dead equal at 0.00 as it does after sliding. I think your CPU speed is a bigger facter than age of your OS etc...
If your'e using an older desktop cpu like an Intel IS7 2nd gen or something like that, they are still much faster than a modern i57200u https://cpu.userbenchmark.com/Compa... 

Nov1718
  centralfiles: <AK> As for Hamppe vs Meitner, 1872 I would think Stockfish 9s failure to find the draw until it is literally staring you in the face(considerably worse than stockfish 8) reflects on a serious flaw in the "improved" pruning. 

Nov1718
  centralfiles: <<AylerKupp:> To close out the discussion if Rf6 holds...> I would frame the argument as follows:
If none of the computer lines actually lead to wins for white nor are strong humans able to find ideas where it is at least unclear if white might be able to force a win.
Then why would we assume that white might be winning after 19...Rf6?
Isn't the logical conclusion here an overwhelmingly likely draw with best play?
How can we even compare this to 19...h6 when no one yet has been able to show a way where it is even a strong possibility that black draws? Can we "conclusively" say its a draw, of course not, but even the initial starting position cannot "conclusively" be shown to be a draw... 

Nov1818
  centralfiles: <In the Rb1 line it may well be better to back solve...>
The key here might well be to back solve until you find a possible improvement investigate, if drawn continue backing up.
Personally I cannot find a single possible winning line with my machine<i am indeed primarily using stockfish 9> every single try has petered out to a draw. 

Nov1918
  AylerKupp: <<diceman> Is it part of your software (or maybe custom programming by you?) that you can track PV moves vs ply depth?> Ha! You give me far too much credit (and I think that's rare :) ). But, no, I use the Arena 3.5 GUI which lists the top 5 moves (or whatever I set the MPV= parameter to) at each search ply. Then there's an option copy the analysis history to the clipboard and then I paste it and save it as a Word file. I do that for the 3 engines I use in the analyses (currently Houdini 6, Komodo 12.1, and Stockfish 9). So, yes, I can track PV moves for each engine vs. ply depth and you can see poor Team Black's evaluation gradually deteriorating by downloading an Excel spreadsheet from http://www.mediafire.com/file/frcfj.... I then, with the help of an Excel spreadsheet (what else?) calculate a RatingsWeighted Average (RWAvg) of the evaluations of the 3 engines for each of their 5 moves because, after all, I think that the evaluation of the highest rated engine (currently Stockfish) should be given a greater weight than the evaluations of the other two engines. And I save the results for the analyses at each ply. The data you see above are the results of the move rankings at each ply based on the RWAvg of the 3 engines at each ply. But no custom programming was involved, just capturing the engines' evaluation results ply by ply. 

Nov1918
  AylerKupp: <<kwid> In the Rb1 line it may well be better to back solve from the draw position shown below to see if the engines can improve this line.> I'm not very familiar with back solving (or backward sliding), and I've never attempted to do it. From what I've read there are two approaches: (1) Use a very big hash table and move forward from the initial position. If your hash table is big enough, then there's a good likelihood that all or at least most of the positions used to define the lines are still in it. Therefore you can back up from the ending position one move at a time and, with the previously analyzed positions still in the table, you can find the alternative lines that got you there. But using this approach you first have to get to the ending position and for long lines that would take both a great amount of time and a very, very big hash table. I don't have that much memory in my computer since, being a 32bit machine, it's address space is "only" 4 GB which is small considering how big the address space of 64bit computers is. (2) You can start with the next to the last move in the line (54...hxg4) and see if the engines' analyses indicate that 55.hxg4 is White's best move. If it is, then restart the analysis at 54.g4. If it isn't, then start either a human or computer analysis (or both) on whatever move the engines considered best. This is no different than what I've been doing so far except in reverse. But it seems to me to be riskier in terms of finding the best sequence of moves, at least according to the engines, because if you're going forward from 18...Rf6 then if you find an engine suggestion (in this case 20.Bd5) which differs from the suggestion the analysis (in this case 20.Rb1), you can begin to investigate the engine's alternative immediately. But if you go backwards then each time that the engines' suggest an alternative move, if you immediately start to investigate that alternative, all your work might be for nothing if, once you restart the backwards solving, the engines once again suggest a reasonable alternative to the analysis' suggested best move. 

Nov1918
  AylerKupp: <<kwid> To close out the discussion if Rf6 holds I searched in support of the white side and could not come up with any winning lines.> Given the complexity of the position and the many apparently reasonable alternative lines after each move, I'm not surprised. As I've said before, the number of possible moves and possibly also the number of possibly alternative moves grows exponentially as the search ply increases. So it's not possible for either human or computer to explore all the reasonable alternative moves. And, if you're trying to <conclusively> determine whether a particular move leads to a draw, a win, or a loss, then I think that's what you have to do. An alternate approach might be to consider all the possible alternative lines, or even only the reasonable alternative lines (which would eliminate most of the lines) as a population. You can then consider each analysis as a sample of that population, and you could calculate (something I haven't been able to figure out how do yet) how many "samples" (i.e. analyses) of the population you would need to do in order to achieve an adequate confidence level (95% is the usual number) that the draw result is correct. If you're not willing to live with that 5% uncertainly then you can increase your required confidence level to, say, 99%, if you are willing to do many, many more analyses. At the limit, of course, you can increase that confidence level to 100% if you are willing to analyze at least every reasonable line and as I've said before that is simply not a practical thing to do. But, unfortunately, if you take the word "conclusively" literally, I think that's what you have to do. That's what a tablebase generator does. 

Nov2018
  AylerKupp: <<centralfiles> I am indeed surprised your old machine managed to get that far down the line in only a little over an hourthough when i said at least days i meant until it sees it closer to dead equal at 0.00 as it does after sliding.> Oh, what I thought that you were saying was that it would take "days" for Stockfish to establish 20...Kd8 as the best move. Starting out I had no idea how long it would take, if indeed it could find it at all. And, because multicore engines are nondeterministic, if you run the analysis again you will likely get a different result. FWIW I'm currently using an about 8yr old Intel Q9400 32bit system with 4 cores running at 2.66 GHz and 4 GB (~ 3.25 GB usable) running Windows XP SP3. I tried downloading the CPU Benchmark indicated on your link but it only runs on Windows & and above so I couldn't use it. I downloaded another performance test that ran under Windows XP & up and covered both 32bit and 64bit computers and ran it. But it wasn't obvious to me what it was trying to do and the output was full of trivial and irrelevant information so I gave up on it. I'll be gone for a few days but when I'll get back I'll download a reasonable performance test program that will run under Windows XP and covers both 32bit and 64bit machines and I'll give you the numbers for my machine and link to the program. Then, if you're curious, you can run it on your computer and see how its performance compares with mine. 

Nov2018
  AylerKupp: <<centralfiles> I would think Stockfish 9s failure to find the draw until it is literally staring you in the face (considerably worse than stockfish 8) reflects on a serious flaw in the "improved" pruning.> Perhaps in this case. But is this a typical case? No pruning approach is likely to work for all cases (and by "working" I would say that means that it's better than its predecessor – by whatever criteria you use to define "better" – over 50% of the time). If it's "better" then that's the one you should use. If it isn't "better", then you shouldn't use it. But using just one example to form a conclusion about a change in implementation is not a good idea. You can always find exceptions to the rule. But the pruning heuristics are not necessarily simple. There are (many) more than one and their effects could very well be interrelated. When I was recently looking at the description of the Greenblatt program because of some questions that came up regarding the 3 games that Fischer played against it in 1977, the 1967 description of the program indicated that it used about 50 heuristics in what it called the plausible move generator which from its description it seems that it performs the search tree pruning function by discarding moves, and hence lines deriving from that move. That's 50 heuristics. In 1967! 

Nov2018
  AylerKupp: <<centralfiles> I would frame the argument as follows ... Then why would we assume that white might be winning after 19...Rf6?> We can't. But neither can we assume that White is <not> winning. There are simply too few (I think) analyses (samples of the all the possible "reasonable" lines) to reach that conclusion. And I haven't yet figured out how to determine the number of analyses that would need to be done in order that the conclusion we reach is statistically significant. It could be a lot! 

Nov2018
  AylerKupp: <<centralfiles? Then why would we assume that white might be winning after 19...Rf6? Isn't the logical conclusion here an overwhelmingly likely draw with best play?> But that's not what <kwid> said originally. His claim was that 19...Rf6 leads to a draw. Period.
"Always" and "Never" types of assertions are difficult to prove by examples and relatively easy to disprove because all it takes is one example of the opposite result to disprove it. And as I've said, given the relatively small number of analyses done compared to the number of reasonable alternative lines possible, I would not agree that the position after 19...Rf6 is overwhelmingly in favor of a likely draw. <How can we even compare this to 19...h6 when no one yet has been able to show a way where it is even a strong possibility that black draws?> Probably because no one seems to have anywhere near the fervor in favor of 19...h6 leading to a draw that you and <kwid> (and possibly others) feel that 19...Rf6 leads to a draw. So the effort that has been spent to "show" that 19...h6 (or any other 19th move by Black for that matter) leads to a draw is nowhere near as much as has been spent to "show" that 19...Rf6 leads to a draw. But even so I disagree wholeheartedly. In the engine vs. engine tournaments I conducted with the 12 games each starting either after 19...h6 or 19...Rf6, the results for Black with 19...h6 were better than the results for Black with 19...Rf6 (see Team White vs Team Black, 2017 (kibitz #3834)). So at least <some> evidence is available that 19...h6 gives better practical chances to draw than 19...Rf6 does and therefore it is "better" in that sense. See why it's risky to make "always" and "never" type of statements? But does that tournament conclusively show that 19...h6 was a better move than 19...Rf6, even in the practical sense? No, of course not. Probably (again) I didn't run enough games. And I ran the tournaments at Classic time controls of 40 moves in 2 hours. How does the average time of 3 minutes/move in the tournament I ran compare to the 2 days/move in the actual game in terms of being able to determine the best move for Black to make? I don't know, but I suspect that our chances to find the "best" move for Black were better at 2 days/move than at an average of 3 minutes per move. Convincing the team to vote for the "best" move is a different issue, as I'm sure you and <kwid> will agree. In fact, after playing in several of these team games, I think that convincing the team to vote for the best move may be harder than finding the best move in the first place! About the only anywhere near overwhelming evidence that tournament provided, I believe, is that after 24 games were played, Black was not able to win even once. While this may also not technically be "conclusive" either, it does seem that if Black has any expectation of winning, improvements for Black need to be found between 5.Bxf7+ Ke7 and 20.Nb5. Because, why would you play the Traxler if the best you can hope for is a draw? <Can we "conclusively" say its a draw, of course not, but even the initial starting position cannot "conclusively" be shown to be a draw...> I agree, that's what I've been saying all along. The position after 20.Nb5 is extremely complex as is the Traxler in general. So to "conclusively" prove anything is at least very hard, and possibly impossible. The best we could hope for, I think, is to show that a particular move at a particular position leads to the desired result in a statistically significant way to a level of confidence we can live with, and accepting the nonzero margin of error in the estimate. 

Nov2018
  AylerKupp: <<centralfiles> The key here might well be to back solve until you find a possible improvement investigate, if drawn continue backing up.> Yes, that's what I thought I said or at least tried to say but with my usual verbosity I didn't make myself sufficiently clear. But my concern in terms of efficiency of the approach still stands; once a possible improvement is found you need to investigate it, and there's the possibility that the improvement forces you to discard all the previous analyses of subsequent moves. So, as you keep going backwards, the possibility of this happening probably increases (it certainly doesn't decrease!) the longer the line. So, as you work backwards to 19...Rf6, if indeed 20.Bd5 turns out to be a better move for White than 20.Rb1, then any work that you did based on 20.Rb1 needs to be "flushed". So it seems to me that going forward is more efficient in terms of having to flush any analysis previously made. And, as I said, one thing I didn't do is validate any of the computer lines, at least the ones that had the same True Rank (evaluation difference within [0.50]) by forward sliding whether by human or by computer. 



< Earlier Kibitzing · PAGE 139 OF 139 ·
Later Kibitzing> 


