chessgames.com
Members · Prefs · Laboratory · Collections · Openings · Endgames · Sacrifices · History · Search Kibitzing · Kibitzer's Café · Chessforums · Tournament Index · Players · Kibitzing

🏆 Stockfish Nakamura Match (2014)

Chessgames.com Chess Event Description
Played in Burlingame, California USA 23 August 2014. See e. g. ... [more]

Player: Nakamura / Rybka

 page 1 of 1; 2 games  PGN Download 
Game  ResultMoves YearEvent/LocaleOpening
1. Nakamura / Rybka vs Stockfish ½-½562014Stockfish Nakamura MatchC07 French, Tarrasch
2. Stockfish vs Nakamura / Rybka 1-01462014Stockfish Nakamura MatchE77 King's Indian
  REFINE SEARCH:   White wins (1-0) | Black wins (0-1) | Draws (1/2-1/2) | Nakamura / Rybka wins | Nakamura / Rybka loses  

Kibitzer's Corner
< Earlier Kibitzing  · PAGE 1 OF 2 ·  Later Kibitzing>
Aug-25-14  waustad: What were the time controls?
Aug-25-14
Premium Chessgames Member
  Penguincw: < waustad >

Time controls were 45 minutes/player, with a 30 second increment for every move, starting at move 1.

Aug-25-14  dunkenchess: In a split of a second the human mind tend to tire, the computer don't.
Aug-25-14  waustad: As I mentioned before in one of the games, the computer doesn't need Sitzfleisch. My impression is that draws were to be had in the losses, but then again there are reasons that I'm not a master, one of which is faulty evaluation of position.
Aug-25-14  Everett: Kudos to nakamura for even taking this match.
Aug-26-14  1d410: Objectively speaking Nakamura did well here, showing he can easily draw after going for the computer's throat the first two games. Maybe humanity will improve..
Aug-26-14  waustad: Something like this can be a good or bad start for the tournament. Maybe it will sharpen Nakamura's mind. A fave of mine Eva Moser got her GM norm right after playing her first blindfold exhibition. Who knows what this will lead to. Best of luck to Nakamura in the Sinquefield Cup.
Aug-26-14  Karpova: The match took place in Burlingame, California and the 4 games were all played on the same day (23 August 2014) - lasting more than 10 hours overall.

In the first two games, Nakamura had access to a 2008 MacBook with Rybka.

In the final two games, Nakamura was on his own, but he received ♙ odds (first the h-♙, then the b-♙).

Stockfish had no access to both, opening book and endgame tablebases.

Source: FM Mike Klein, Stockfish Outlasts "Rybkamura", 24 August 2014, http://www.chess.com/news/stockfish...

Aug-26-14  Eric Farley: The computer will now be renamed Beatfish.
Aug-26-14  ralph46: i watched the games live on ches.com in the second game towards the end NAKAMURA got active by playing f5 at that moment i knew he was lost and he did lose
Aug-26-14  erniecohen: The odds and result are sort of humiliating, though 4 games in one day is pretty unfair.

The Macbook Rybka was only about 200 points worse than Stockfish, so wouldn't it be expected to do just about as well without Nakamura, but with a mediocre opening book and small tablebase?

<1d410>: I didn't see Nakamura's easy draw in game 2, though Stickfish was kind of lucky to find the win given how it went about it.

Aug-26-14  Kinghunt: Stockfish scoring +2 in 4 games means it performed 200 points better than Nakamura , plus handicaps. Let's conservatively say that Nakamura plus Rybka play at 2900 level. f7 pawn and move has been estimated to be worth ~320 points, but as we had the h and b pawns removed, let's call it a mere 120 points. Thus, by the most conservative estimates possible, Nakamura had an adjusted rating of 2900, and still lost by the equivalent of 200 rating points.

In short, it's a small sample size, and odds games aren't quite real chess, but it appears safe to say that <a lower bound for a "human rating" for Stockfish is <3100>>.

Aug-26-14  Reisswolf: A stupendous performance by Stockfish. To give pawn odds to a top-level grandmaster, and play without an opening book or endgame database, and to still win, is very noteworthy indeed.

I agree with <Kinghunt>. Stockfish's "human" rating could probably be well above 3000.

Aug-27-14  zanzibar: I remember seeing some article introducing this match and citing Stockfish as being rated 3200.
Aug-27-14  Kinghunt: <zanzibar: I remember seeing some article introducing this match and citing Stockfish as being rated 3200.>

Stockfish is rated 3285 on CCRL 40/40, the primary engine rating list. However, these ratings are given based on performance against other engines only, as there are very few man vs machine matches to try and calibrate the rating lists. Thus, engine ratings are not necessarily directly comparable to human ratings.

One way to clearly see this is to just consider the effect of hardware. The engine rating lists test the programs, only, but if you want to know real playing strength, hardware obviously matters. So does Stockfish play at 3200 strength running on an old laptop or does it play at 3200 strength running on a modern high-performance cluster? Again, that's a question we can only begin to answer with an actual man vs machine match.

Aug-27-14  latvalatvian: I can't believe a computer would defeat a human being. Nakamura must have threw the match.
Aug-27-14  zanzibar: <Kinghunt> Good points.

And speaking of points - don't forget to deduct a few from the CCRL rating due to the match conditions on Stockfish - no opening book or endgame tables.

A few rating points should be shaved off - rounding down to 3200 is reasonable (bearing in mind the points you raised).

I'll try to find that article, they listed the hardware used (some quad-core mac iirc).

Aug-27-14  zanzibar: Ah, memory is a fragile thing - it was a 8-core Mac.

<According to match conditions, Stockfish was not allowed to access either an opening book or an endgame tablebase. What it did have was brute power -- match co-organizer Jesse Levinson said it was "the latest development build compiled for OS X and running on a 3ghz 8-core Mac Pro."

In comparision, Nakamura had the assistance of an older version of Rybka (about 200 points less than Stockfish's 3200+ rating), and it ran on a 2008 MacBook. Of course, he also had his 20-plus years of chess knowledge in play.

"The inspiration for this match was me opening my mouth too much," said co-organizer Tyson Mao. "I was wondering out loud how my [2008] MacBook could compete against today's chess engines.

"The main question is, 'Do humans add any value to chess engines today?' It's a very polarizing question. That's why we're having the match.">

http://www.chess.com/news/stockfish...

Aug-27-14  Landman: We may someday reach the point that any assistance given by a human would weaken an engine's play.
Aug-27-14  zanzibar: <landman> Does that include oiling the gears and the like?!
Aug-27-14
Premium Chessgames Member
  ketchuplover: I bet John Henry could smash StockFish...literally
Aug-27-14  Taxman: It's interesting to see odds games between a top computer and a top human grandmaster. Perhaps such matches will become more popular as the machines continue to grow in playing strength.

I wonder what the “break – even” odds are; i.e. the level of odds at which the best human players will always win (at the time limit used for the Nakamura games), even against the top chess programs from the far future, running on the fastest hardware.

For example, I doubt that any computer will ever be able to give the top human player queen odds and still win. However, based solely on the Nakamura games (very limited evidence I realise), it appears likely that the break – even odds are somewhat more than a pawn.

Two pawns? A Knight?

What do people think?

Aug-27-14  nimh: There's no one-on-one relationship between human and engine rating systems. When one goes up the CCRL rating ladder from 3200 to 2400, it doesn't mean performance against humans would increase by the same amount. This stems from the fact that humans and engines have completely opposite accuracy vs rating relationship. I last year created a paper where I tried to establish a link between both rating systems based on comparisons on the accuracy of play through FIDE and CCRL rating scales.

http://www.chessanalysis.ee/CCRL%20...

Whether CCRL 3200 really corresponds to FIDE 2910 is debatable and further researches will surely shed more light on this. The relatively low accuracy in engine games is explained by 3x shorter time controls and 8x slower cpu than fastest one that can be bought for less than 500$.

The main finding is that there's no linear relationship. The further up to go along the CCRL rating ladder, the less there is to gain against humans.

Reasons for such behaviour? Well, I must admit I have no idea :)

Aug-27-14  Kinghunt: Thanks for sharing your paper, <nimh>. Very interesting read. However, I have a few objections:

<First>, it appears that for both humans and computers, the games chosen to benchmark accuracy were from "average" GMs and computers, and you are extrapolating your regression to data your model was not built with (ie, 2800+ players and 3100+ programs). This is a dangerous way to model, and such extrapolations are never trusted in statistical analysis.

Looking at the plots demonstrates exactly why: the claim is made that a 2900 human player would have an expected error of 0.05, but the lowest error rate we have is 0.12. We don't know if it's even humanly possible to have an error rate that low. Moreover, this is especially concerning given the unusual data in the 2400 range - small changes in model parameters can result in large changes of prediction.

I feel this issue could be resolved fairly easily by analyzing games by 2800 players (say, Carlsen as the closest we have to 2900, along with Aronian and Topalov to represent several styles of play) and 3200 engines (say, Stockfish, Komodo, and Rybka, the last one being important to make sure stronger engines do actually score better than weaker "Rybka-like" engines).

<Second>, the issue of time control and hardware cannot be ignored, and can be addressed in a similar manner to my above recommendations. Pick some TCEC games, which are played at classical time controls on strong hardware (and by very strong engines), and analyze them in the same way.

Finally, I also have to object to your conclusion:

<But it nevertheless turns out that expectations that top engines on up-to-date desktop machines are supposed to perform 3100-3200 against humans are a myth.>

Given what I pointed out above, I do not believe there is sufficient evidence to support such a claim. To the contrary, we have this match (as well as older odds matches) as evidence that computers, in fact, do perform 3100-3200 against humans.

Please do not take anything I said the wrong way - I think the chess world would benefit greatly from more people like you doing this kind of analysis. I am giving these comments because as much as I like what you've done, I think it can be even better and hope this work can be continued and made more convincing.

Aug-27-14  bobthebob: <I wonder what the “break – even” odds are; i.e. the level of odds at which the best human players will always win>

I wouldn't define the break even odds like that but rather at what point is the expected outcome close to even.

In that case, I would think if Naka went in with goal of drawing every game, that with these odds he would have been able to do that.

The thing about piece odds is that that throws theory out of the window on move 1 and as a result the computer with its superior calculating ability would not be at much as a disadvantage if a human was down a piece.

interesting discussion.

search thread:   
< Earlier Kibitzing  · PAGE 1 OF 2 ·  Later Kibitzing>

NOTE: Create an account today to post replies and access other powerful features which are available only to registered users. Becoming a member is free, anonymous, and takes less than 1 minute! If you already have a username, then simply login login under your username now to join the discussion.

Please observe our posting guidelines:

  1. No obscene, racist, sexist, or profane language.
  2. No spamming, advertising, duplicate, or gibberish posts.
  3. No vitriolic or systematic personal attacks against other members.
  4. Nothing in violation of United States law.
  5. No cyberstalking or malicious posting of negative or private information (doxing/doxxing) of members.
  6. No trolling.
  7. The use of "sock puppet" accounts to circumvent disciplinary action taken by moderators, create a false impression of consensus or support, or stage conversations, is prohibited.
  8. Do not degrade Chessgames or any of it's staff/volunteers.

Please try to maintain a semblance of civility at all times.

Blow the Whistle

See something that violates our rules? Blow the whistle and inform a moderator.


NOTE: Please keep all discussion on-topic. This forum is for this specific tournament only. To discuss chess or this site in general, visit the Kibitzer's Café.

Messages posted by Chessgames members do not necessarily represent the views of Chessgames.com, its employees, or sponsors.
All moderator actions taken are ultimately at the sole discretion of the administration.

Spot an error? Please suggest your correction and help us eliminate database mistakes!

Copyright 2001-2025, Chessgames Services LLC