< Earlier Kibitzing · PAGE 20 OF 22 ·
Later Kibitzing> |
Sep-23-10 | | bartonlaos: That's because in OTB some moves are played to affect the opponent's mind, and wouldn't be a computer's top choice. So you must consider the Chess that's involved, rather than reducing it to numbers. |
|
Sep-23-10 | | cotoi: <bartonlaos> That is an interesting idea, but it introduces some subjectivity. I think it's sound to remove from statistics all the moves for which the other candidate moves are at least 2.00 away. In this way the problem admittedly becomes harder and it's simply another discussion. Another method. PS: I have the impression that, removing forced moves, the average of human hits for the top choice is at most 50%. But I'm not sure. Ken Regan might have some data on his website about this. |
|
Sep-23-10 | | bartonlaos: I think forced moves or common replies should be excluded, as in the game she had against <saksipotku>, which almost plays itself. Objectively speaking, that game should be dismissed. I think the tighter the differences between candidate evaluations the more power should be added for selecting the best choice. I also think other factors could objectively be used to exclude certain kinds of games. One such factor is the opponent's own fidelity. <kingkoy301> scored a remarkable 31/33 top 1 choices in his game against her, and he is accepted as an honest player. Since you can't suspect one player and not the other, you'd be forced to assume there is something about the game itself that created the high fidelity. You would then need to discard it to prevent unfairly skewing the results. Chess is just so much more complex than the percentages from the Top 3 method can show that each game must be analyzed as a whole, and the evaluations for each candidate need to be considered before anything can be assumed about the player's choices involved. So players outside of <Chess.com> should invest in deeper analysis before drawing any of their own conclusions about the games. |
|
Sep-23-10 | | cotoi: I don't know if kingkoy301 is accepted as a honest player. Did chess.com make such a statement? One advantage of the top-3 method is that it is replicable and it's not biased. On what criteria would you dismiss a move as "forced"? Counting forced moves will not skew the results since the same approach was used for the games in the benchmark. Any chess game will have forced moves. That is one reason for which one considers the top-3 choices and not only the top choice. You might have a tactical game, with many recaptures and only-moves and you obtain a high top-choice agreement. Yet, it's unlikely to get a very high agreement with all the top-3 choices (the games of Yelena on chess.com excluded) |
|
Sep-23-10 | | Marmot PFL: On internet chess when I win, it's because of my superior intellect; and when I lose it's either because of distractions or because I was playing a machine (possibly both). Also I am running into many players that disconnect when short of time, them reconnect later and play the position perfectly... |
|
Sep-23-10 | | Fezzik: Dembo drew a player +425 elo lower than her at the Olympiad. Over on chess.com, people were gloating. It turns out that for 31 non-dB moves made in that game, Dembo played >93% "top 3" according to Firebird at 17ply. One of those third-choice moves was a blunder that threw away a win! (She had 100% correlation of top 4 moves) A single game is not enough proof of anything, but it should show that chess.com's method should be more rigorously tested. |
|
Sep-23-10 | | bartonlaos: <MarmotPFL> I have never lost a game on the internet where I wasn't sure if my opponent had used a computer! <cotoi> kingkoy301 is a member of good standing. He's playing now: http://www.chess.com/echess/game.ht... Their game was in May, yet he's survived the scrutiny of Chess.com's awesome team. The analyzers can't suspect her percent, while accepting his. Their game must be dismissed, leaving 19 games, short 1 game to have launched an investigation. <Zygalski> You should have noticed these kinds of games when you were reporting Yelena's results to the Chess.com authorities. The game shows the weakness in the Top 3 method when it is used so crudely as Chess.com does. Top 3 needs to be a part of Chess, rather than a numbers game. On forcing moves you'd set up an increment giving more power to smaller differences of evaluation between candidates multiplied by the number of candidates within a threshold. <You might have a tactical game, with many recaptures and only-moves and you obtain a high top-choice agreement. Yet, it's unlikely to get a very high agreement with all the top-3 choices.> What? NO! If you get a high top choice agreement, then all the rest of your choices are at least as high as the top choice. kingkoy301's 31/33 leaves virtually no room for improvement. Besides no one is suggesting that highly tactical games are easy to play. There are games which contain lots of recaptures because one of the strategies you should employ after gaining an advantage is to force exchanges. <Counting forced moves will not skew the results since the same approach was used for the games in the benchmark.> You are now contradicting your own opinion about kingkoy301's game. The benchmark for cc was almost passed by Yelena in her OTB game, it is just too low and a poor normal control: Yelena's Olympiad............64....72.......76%
Heritage benchmark..........65....80......90%
There were 25 possible moves. Given 3 days per move would she find 4 better moves or not? |
|
Sep-23-10 | | bartonlaos: <Fezzik
<Dembo drew a player +425 elo lower than her at the Olympiad. Over on chess.com, people were gloating. It turns out that for 31 non-dB moves made in that game, Dembo played +93% "top 3" according to Firebird at 17ply. One of those third-choice moves was a blunder that threw away a win! (She had 100% correlation of top 4 moves)A single game is not enough proof of anything, but it should show that chess.com's method should be more rigorously tested.> > AWESOME! |
|
Sep-23-10 | | cotoi: <fezzik> please post the game, I don't find it in the database of chessgames. Secondly, I don't think that anyone claimed that a game is significant, but the average, over more games. I am also very skeptical that all chess.com cheating detection resumes only to counting engine match-up. For example, once you ran the games through an engine, it makes sense to count the blunders too. And I agree that the average error in a bunch of games might indicate computer-like play. <bartonlaos> I really don't care about the user konkoy. We have a game in which Yelena made some moves. As for forcing moves, feel free to develop a better method. Personally I don't know a sound criteria to make the method replicable. |
|
Sep-23-10
 | | Eric Schiller: As an arbiter, all this interests me a bit. I expect that most GM moves will be in a computer top 4, unless it is a "psych", so I'm not sold on the methodology. More suspicious, to me, is making the kind of subtle maneuvers made more by machines than humans. That, of course,is rather subjective. But in most cases it is easy for s player to explain why a move is made, but I find many of Rybka's moves baffling. All that said,I do have a suggestion for presenting the human/machine comparison in graph form. Put a data point at the top of the graph every time the computer's best move is matched, 25% lower for second, 50% lower for 3rd and 75%n lower for 4th. This makes it easy to follow and probably can be automated using a spreadsheet with graphing capabilities. A picture is worth 1000 words (or numbers). One formula I am confident in: accusations of cheating increase with difference in ratings in cases of upsets. |
|
Sep-23-10 | | pulsar: <More suspicious, to me, is making the kind of subtle maneuvers made more by machines than humans.> I agree with you, <Eric>. |
|
Sep-23-10 | | bartonlaos: <cotoi> The reason <kingkoy301>'s 31/33 top 1st choice results matter is to consider the outliers of your dataset. Outliers may skew your results abnormally. Outliers for your set would involve abnormally high fidelity for her 'normal' opponents. The only way for a 'normal' opponent to achieve abnormally high fidelity is if there is something intrinsic about the game itself that skews the results. Think of each game as a datapoint, and the entire 20 games as a dataset. In statistics, datapoints will be rejected as outliers if they are located far outside normal values. It's important to identify the outliers if you want to understand the truth about what's being measured. So you absolutely must check all the 'normal' player's fidelity discarding the outliers before you begin the analysis. Since <kingkoi301> is a 'normal' player who has achieved abnormally high fidelity, this game:
http://www.chess.com/echess/game.ht...
must be discarded as an outlier. |
|
Sep-24-10 | | Zygalski: User: bartonlaos
The way I was taught the methodology was to objectively select an absolute minimum 20 games, each with 20+ non-database moves against at least reasonably strong opposition.
This is so you get a reasonably large sample-size of non-database moves. Typically 700+ for each suspect. I & others too have said that a few (or a single) cherry-picked game(s) mean nothing in terms of this methodology. The key factor is the match up rates in many games over time. It doesn't matter if Dembo matches top 3: 100% in an otb game, as I said, one of the analysts already looked at Dembo's otb: "I picked the last 20 games >= 35 moves from chessgames.com. All against high-rated opposition (lowest ELO in the set is 2222, highest is 2655)" thinking that perhaps Dembo's natural style of unassisted play is rather more engine-like than should be expected. The results of the analysis were: Dembo OTB 20 games:
Stockfish 1.8, 512MB hash, min/max ply=12/30, 40s/ply, 2GHz Core Duo:
Yelena Dembo (Games: 20)
{ Top 1 Match: 422/929 ( 45.4% )
{ Top 2 Match: 615/929 ( 66.2% )
{ Top 3 Match: 717/929 ( 77.2% )
{ Top 4 Match: 777/929 ( 83.6% ) |
|
Sep-24-10 | | Zygalski: As to Yelena's opponents, well we'll have to wait & see if or when some of them get removed from site. The analysis process alone takes some time. Maybe chess.com doesn't have an army of highly-paid, titled staff all analysing 100's of games constantly? Would you rather they banned members in a haphazard, hurried fashion or that they prioritize & ban the most blatant cheats first? I selected Yelena for batch analysis because
1)she had 10-15 games in progress vs 12 players rated a chess.com average of 2614. She was moving in a daily basis in all games, as I recall.
2)her win/loss/draw record of 140 wins/0 losses/15 draws.
3)she was #21 highest rated and climbing, which meant she was well within the top 0.1% highest rated on site. Most engine users are booted from the top 1% (2500+ rateds) so there was a likelihood that she would be playing games against some other engine users whilst maintaining her unbeaten record. This can al sound like nonsense to you - as you wish. The simple fact is that the analysis for these Dembo games suggested blatant engine use, from 4 separate analysts.
This evidence was forwarded to chess.com staff for them to run their own checks (whatever they are!) and Yelena was later removed from site. If you don't like the fact that Yelena's engine match rate in her games meant that she was put forward to staff as a suspected engine user then... well, tough luck!
|
|
Sep-24-10
 | | tpstar: <I selected Yelena for batch analysis> The present approach seems exactly backward: brand her a CHEATER, now she must prove her innocence. I am increasingly skeptical about this Top-3 methodology. Why wouldn't a true cheater choose #1 every time? Moreover, the chess.com crowd here seem to be arguing that when her moves matched a computer, she was cheating; when they didn't, she was covering up that she was cheating. Along with being a self-fulfilling prophecy, this retrograde analysis amounts to a very shaky foundation at best (talk about cherry-picking). Besides, I would expect any correspondence game to be much higher quality than OTB due to the longer time control, so how can you rationally compare the two? A 2400 IM who has written chess books doesn't get there by cheating. I happen to believe that the only thing she did wrong was playing too well. I reiterate that anyone accusing her of cheating should give their real identity. Fair is fair.
|
|
Sep-24-10 | | Zygalski: Erm which part of the following satement do you not understand: Yelena Dembo was not banned from chess.com purely on the basis of engine match up analysis. Chess.com ran their own checks and closed the account some time after match up analysis was submitted.
For whatever reasons, chess.com will not publicly disclose their own cheat detection methods. |
|
Sep-24-10 | | ycbaywtb: since when did computers get more respect than humans///why aren't we asking how many times the computer chooses our moves?? i suspect this player got jobbed, from what it sounds like |
|
Sep-24-10 | | Da Real King: Sure sounds like that was all they used to me. She was there three years. And was banned right after analysis was submitted. DUH! |
|
Sep-24-10 | | cotoi: <bartonlaos>Haha, if someone plays with an extremely tall guy, the game must be discarded as an outlier? No way! <@tpstar>: the analysis is not retrograde. It's a big difference between going backwards (basically asking the engine to check if some moves are good) and going forward (asking the engine to discover the best moves). The OTB versus CC dilemma has been addressed repeatedly. <@Eric Schiller: "But in most cases it is easy for s player to explain why a move is made, but I find many of Rybka's moves baffling."> Right, but how can we quantify this? I have no idea. |
|
Sep-24-10 | | CCplayer: I think this is the whole problem. You can't create a method that only compares made moves to those suggested by an engine. You must involve human judgement, and that human must be a reasonable chessplayer (possibly at least 2300-2400 in some cases maybe stronger). One reason for this is that the engine will play much more human-like in semiopen positions than in closed positions for example, which in essence means that your match-up rate will partly depend on your opening repertoire. This approach is obviously much more time-consuming and requires certain skills that exclude many of the pseudo-stats enthusiasts that are currently involved in this area. |
|
Sep-24-10 | | SugarDom: She probably had some computer assistance... imho... |
|
Sep-24-10 | | bartonlaos: <Zygalski>
There existed about a 300 point rating difference over 19 of Yelena's 20 games, with an 8 point rating difference in the most recent game. There was also at least a 400 point strength difference from the 9 games taken from 2008, with opponents averaging 2250-Chess.com (FIDE: 2050) while her FIDE rating was 2450. We know that the less strong is the opponent, the easier it will be to find good moves, and the higher fidelity will be achieved. Yelena demonstrated this principle in two of her remarkable Olympiad-2010 games. This suggests her high-fidelity are the result of rating differences. So it is critical to have used the proper benchmark with which to measure such a batch. At the very least, your benchmark-batch should have been derived from correspondence games that have opponents so disequally-matched - as any qualified statistician would know. This suggests that you aren't a qualified statistician, aren't qualified to defend this benchmark, and certainly aren't qualified to be convincing others of your flawed-design. By what expertise can you even suggest she is guilty of "blatant engine use". Considering that Chess.com launched an agreement to withdraw its accusations, your efforts trying to convince us that she's guilty are throwing them right back into the fray. |
|
Sep-24-10 | | Zygalski: Yes I agree 100% & now unreservedly apologise for any inference that WGM IM Yelena Dembo may have used an engine to suggest moves at any point in any of her chess.com games.
I don't know why chess.com closed her account, but the top 4 matchup methodology as outlined by me is totally flawed. I wish WGM IM Dembo continued success in both her otb & cc career & have requested chessgames.com staff remove all my posts & quotes related to them. |
|
Sep-24-10 | | Ezzy: Correspondence chess without computers is asking for trouble. It's near impossible to regulate whether someone has used or checked lines with a computer. To single out individuals for cheating is pointless and stupid. While there is internet chess, this problem will exist forever. Chess on the internet should be for fun. If some 1600 Elo rated player wants to start an account with an internet chess site, and they decide to use Rybka at the start and get a rating of 3200, then he's stuck with having to play computer chess for his duration on the site. They will soon realise there's no fun in that. But if a person changes and decides to be honest, then there's always someone to replace them who's just as dishonest. Round in a circle we go. Would you play correspondence chess for money if the rules say 'no computers.' I definately wouldn't. The site owners are kidding themselfs that they can find and control the cheats. Some cheat by picking computer best moves, some cheat by mixing the best move with the 4th best move, some don't pick computer moves but check their analysis with a computer. Some may not cheat at all and then suddenly they have to go to a relatives wedding and they don't have time to analyse, so on just this one occasion they let the computer show them the way. How do you control who's doing what. If you ban Yelena, then there must be hundreds more who at some level have had a snidy look to see what the computer says. Too much controversy. Allow computers in correspondence chess and be done with the flame wars and character assassinations. Never take internet chess seriously, it could damage your health. You never know when it's a true game or not. Accept it, and play for fun, NOT for money or prestige. |
|
Sep-24-10 | | cotoi: <Ezzy>: but this is what we want! We want to play chess online for fun, but most of us find there's no fun in playing again and again with stupid kids running Rybka! Tools like admax have been invented to allow cheating on some sites even in 1 minute bullet games. Sure, I agree that it's impossible to catch all the cheaters, but at least the most blatant ones must be removed. Otherwise ... everybody will play on yahoo. Or start playing other games. I don't agree with your numbers: I think there are thousands of cheaters, not hundreds. <bartonlaos> Have a look at some simuls of Kasparov and see there how the rating gap makes the higher-rated player make engine moves. |
|
 |
 |
< Earlier Kibitzing · PAGE 20 OF 22 ·
Later Kibitzing> |