ARCHIVED POSTS
< Earlier Kibitzing · PAGE 731 OF 1118 ·
Later Kibitzing> |
| Aug-06-14 | | zanzibar: Actually, now that I'm aware of it, <CG> is using FIDE-like names quite a bit in <Tromso olm 2014>. Which I would normally approve of, it only <CG> were consistent in its usage. |
|
Aug-06-14
 | | chessgames.com: <And why, in this game: is [Black]'s name <Vazquez Igarza, Renier>?
Shouldn't <CG> use the canonical <Renier Vazquez Igarza>?> Ah, you've stumbled upon one an important nuance of Chessgames uploading. Time for your next lesson, grasshopper. A program called "fixpgn" gets run once daily which scans all PGN files for various possible changes and modifies PGN to make this consistent. This does not happen instantly upon uploading (and that is intentional, motivated part by efficiency and part by the need to monitor uploads.) Suppose the Librarian changes the name of "J Garcia" to "Juan Garcia" and the guy has 100 games. The software does not run through all of those games and make the necessary fixes to the PGN immediately, as you think it might. Instead what happens is that this fixpgn program runs through the database and recreates a version of the PGN based on everything we know about the game (the players, the tournament, the moves, etc.) Then it compares the freshly produced PGN to what we have on file. If there's no difference it skips it, but if there is the slightest change the game gets updated. (The old version of the game is kept in a backup table for Librarian reference.) So pgnfix not only corrects player names, but looks for mistakes in year formats, fixes changed ECO codes, reconciles PGN dates with database dates, and even word-wraps the moves. In short, it makes our PGN look nice, even if what we received looked crazy. So what this means is, that if you download the PGN from our site the instant it's available it will look a a little different than if you wait some time. If you go back and look now, the PGN shows the CG canonical name, not because I just fixed it but because it's past midnight. |
|
| Aug-06-14 | | zanzibar: Thanks <chessgames> for the enlightenment. So much to learn. I'm a little surprised at how it works (since I naively expected fixpgn to run before posting a game to the public). So, am I correct in assuming that <CG> processes generic games submitted by a user-at-large the same as those pulled from a tournament's official site? E.g. you never explicitly make a crosstable to ensure all the games are present and accounted for, and that the results match those officially published? It seems my initial observations of the Tromso OLM games are setting up lots of games which will need fixing by hand subsequently. At least that's how it seems to me. For example, there's "shadow" players like this:
Evgenij Agrest (N=1)
Evgeny Agrest (N=450) Maybe using the FIDE id will fix a lot of this. Hopefully. But my program is beginning to spin out of control trying to patch all these special cases. Comparing <CG> to <FIDE> is made much, much easier if each PGN file is normalized with itself. BTW- is it possible to get a <CG> pgn download with <only> games that have been fixpgn processed? |
|
Aug-06-14
 | | chessgames.com: <So, am I correct in assuming that <CG> processes generic games submitted by a user-at-large the same as those pulled from a tournament's official site?> Yes.
<E.g. you never explicitly make a crosstable to ensure all the games are present and accounted for, and that the results match those officially published?> True, we don't attempt to do that, although it's worth mentioning that addressing this issue is exactly the point of the Biographer Bistro and the Game Collection Voting, and the new system of leaderboard editing. Of course that has nothing to do with new tournaments, at least not yet, but it's the same concept. <It seems my initial observations of the Tromso OLM games are setting up lots of games which will need fixing by hand subsequently. At least that's how it seems to me.> I'm sure you are right. I've already patched some errors simply by deleting the games. They weren't particularly important games so they had no kibitzing and were not in any collection. Sure enough, on the next data fetch the correct games replaced the initial bad PGN. This is our version of the "roll back" that you mentioned. If necessary, we can delete bad games and then fetch the new data from the site. Of course it's always preferable to fix a game than delete a record for various reasons. <For example, there's "shadow" players like this:
Evgenij Agrest (N=1) Evgeny Agrest (N=450)> A few days ago the software would have just assumed that must be the same player and matched them up correctly, but after Stonehenge's rightful complaint, we tightened the reins on speculative matching. The Librarian also catches a lot of things like that, albeit days after the game is uploaded. <Maybe using the FIDE id will fix a lot of this.> It surely will, and I suspect that the <Evgenij Agrest> shown above was inserted only before you pointed out that they are using the wrong FIDE tags in their PGN. <BTW- is it possible to get a <CG> pgn download with <only> games that have been fixpgn processed?> No, not really, but we run fixpgn all the time (I said midnight, but that's a bare minimum. We run it during the day very often, for example after user uploads.) So 99% of the time you won't see PGN that needs fixing, you just happened to stumble onto some recently. It's possible to do a "mini-run" of fixpgn right after each data fetch to ameliorate this issue; then the odds of you finding a game in need of fixing would be very slim. Thanks for the idea. About your comment <So I would use a 1-to-1 mapping of gid <-> pairing, and update accordingly> I have so many opinions on that I didn't respond. Not to say that it's a bad idea, but the devil's in the details. Understand that part of the difficulty here is that each new tournament brings its own set of new challenges. The kinds of problems we have to cope with are innumerable, which is why it's hard to find a one-size-fits-all software solution. |
|
| Aug-06-14 | | Kinghunt: Hi Chessgames,
Could we get a Bookie bet going sometime in the next week or so for whether Kasparov or Ilyumzhinov will win the FIDE presidential election? Thanks! |
|
Aug-06-14
 | | chessgames.com: That sounds like great fun, you should post your suggestion over at the Chessgames Bookie chessforum. |
|
| Aug-07-14 | | Kinghunt: Done, thanks for pointing me to the right place. |
|
Aug-07-14
 | | Tabanus: Hi CG, Max Lange d. 1899 and Max Lange d. 1923 have the same pic, see the Bistro. I don't know which is correct. |
|
Aug-07-14
 | | Domdaniel: The bushy-bearded Max Lange (d. 1899) is represented by the same pic in Wikipedia, among other places. This, of course, is no guarantee of correctness. Although, aesthetically, I doubt whether a beard like that could survive into the 20th century. So the later Lange (aka Max Lange 2) should perhaps do without a pic. |
|
| Aug-07-14 | | crawfb5: A similar photo of an older Max Lange can be seen on page 171 of Golombek's Encyclopedia of Chess. |
|
| Aug-08-14 | | zanzibar: OK, I just deleted two big posts for <Tromso olm 2014> where I thought <CG> was still feeding me different player names for the same player. I realize that all of these games were from Round 5 - and so were due to fixpgn not having run yet. But I was thinking about this a little - and I don't understand why, if the games show up on a player's profile page they don't use the canonical name. Whatever automated software that added the game knew the player's identity - how much harder would it be to tweak the White/Black player tags right then and there? I can understand waiting to do the opening ECO, and maybe to scan the movelist for illegal moves and notation changes (like simplifying a move if one piece is absolutely pinned). But not having the right player name in the PGN seems wrong to me, even now that I understand the process a little better. Like I said, you know the player match when you publish the game, so why not make the PGN congruent? |
|
| Aug-08-14 | | Kinghunt: I would like to request a feature that I think would be both extremely useful and relatively simple to implement: the ability to filter games by type. All games in this database, as far as I know, are tagged based on type already (ie, classical, rapid, exhibition, etc). When you perform a head-to-head search (say, search "carlsen-caruana"), the box at the top can give you the results of the classical games only, which is great if those are the ones you're interested in. But when you scroll down and look at the list of games, there is no way to filter out the non-classical games. It would make things a lot cleaner and easier to browse. I think this could be done easily and unobtrusively. At the bottom of all search pages, there's a "Refine search" box where you can filter by result. Adding one or two more similar options ("classical", "rapid/exhibition") would do the job nicely (as would adding it to the main search page, if that would be easier). I don't know how this would work on the back end of things, especially given that the game type appears to be stored somewhere other than the PGN, so maybe this would take a little more work than I thought. Anyway, I just thought I'd ask about it. Thanks for reading through this! |
|
Aug-08-14
 | | chessgames.com: <Kinghunt> That feature is well overdue, however I don't plan on implementing it precisely as you described. I am thinking more of an search feature. For instance, you'll be able to search for "Fischer-Tal blitz" in the EZ Search, or use a pulldown called "game type" on the advanced search. Please understand that until last year that field was so spotty and unreliable it couldn't be trusted, so our omission was intentional. At this point, after thousands of corrections and manual tournament sweeps by the Librarian, it has become very good and worthwhile data to search on. <zanzibar> I totally agree, fixpgn should at least have a preliminary run on incoming games. Don't take this the wrong way, when I say that you're not the first to have noticed the delay but you are the first to complain about it. |
|
| Aug-08-14 | | zanzibar: <chessgames> I understand, it's not the first time I find myself on the vanguard on some detailed technical matter. A little curious that no one else has complained, but I can offer the following explanation. Normally, you just wait and fixpgn does its job and all is fine a day later. And probably most people just want the moves of the game in the first place, and so all is fine there too. But I'm trying to automate the checking of <CG> games against <FIDE>. This seems merited, if only for the benefit of the missing games found (e.g. Dubai 2014, and I think also for Tromso 2014). Now, the problem is that I need <CG> to be consistent with itself to do that. Of course, I can wait a day for the pgn to be fixed, no problem. Let's say I wait a day for round 5 to be fixed. The trouble is that by then round 6 will have been played, which again means my program will be confused (at least until I add special handling, or <CG> normalizes player names, or <CG> adds the <ChessgamesPlayerId> tag). And so, the PGN will be troublesome the entire duration of the tournament - give or take a rest day or two. Generally, the sooner you find problems and correct them, the easier life is down the road. And I was hoping to check the games round-by-round as the tournament was being played. |
|
| Aug-08-14 | | zanzibar: BTW- I like <Kinghunt>'s idea. I also hope that improved advanced search will include the Event/Site fields as well. |
|
Aug-08-14
 | | chessgames.com: <include the Event/Site fields as well.> Well, we recently added a pulldown for the most recent tournaments, but I know what you mean. |
|
| Aug-08-14 | | The Last Straw: Hi <chessgames.com> The Chessgames Premium Membership Tour
I noticed that here you had a review coming from Craig Van Tilbury. I would like to ask you to consider removing it as he died in 2010. Thank you.
(This is not a request, just wish you to consider it.) |
|
Aug-08-14
 | | chessgames.com: <The Last Straw> Thank you for bringing that to our attention. |
|
| Aug-08-14 | | Kinghunt: Thanks for the response. There are many ways to implement it, and I'm sure whatever you decide to do will be wonderful. Looking forward to seeing what you come up with! |
|
Aug-09-14
 | | Stonehenge: The black star here is strange:
http://www.chessgames.com/perl/ches... |
|
Aug-09-14
 | | chessgames.com: According to the PGN we received, Sy has a rating of 19,900,139. That's a whopping 19.9 units of megaimportance. |
|
Aug-10-14
 | | Domdaniel: Gradually, bit by bit, square by square, the CG database is becoming the best in the world. It's already up there with the leaders, but the sophisticated tidy-up routines that you folks run just keep on improving it. Most impressive, rilly. Just saying. I'm a fan. But you knew that. |
|
| Aug-10-14 | | zanzibar: I like just being near such magnificent megaimportance - provided I've lathered on the sunblock. https://www.youtube.com/watch?v=HZj...
* * * * *
Speaking of curiosities - when I do a search on <Alexander Ipatov>, <CG> delivers the two expected matches, but with the following complaint: <ERROR: Over 32,000 games match your query.Please select more restrictions on your search.> And I don't even get my normal footers at the bottom of the page! I think <CG> is using the wrong <Alexander Ipatov> as well, the player at Tromso is playing for board 4 for Turkey: https://www.youtube.com/watch?v=nL4...
It's possible the Russian <Ipatov> is also playing, in which case <CG> is providing PGN with two distinct players with the same indistinguishable name. So... what's the <Ipatov> story? I don't have time to double check this - but I fairly sure it's a problem one way or the other. (Actually, I even wonder it these are two different players - but I don't have time to investigate right now) |
|
Aug-11-14
 | | chessgames.com: <DomDaniel> Thanks, that is very inspiring. <zanzibar> Repo Man is always intense :) About the search, it's fairly easy to explain although the situation is pretty bad. Here's what's going on: The search was modeled after the 'classic' search engine approach (Altavista, Yahoo, etc.) meaning if you search for "A B" there is an implicit OR, so you are really searching for "A or B". So that means when you search for "Alexander Ipatov" you really are searching for "Alexander OR Ipatov". Apparently we have over 32,000 games with Alexanders now. I tried an experiment once to make it an implicit AND, so in that case it would work perfectly, but then the problem was that people would type "Alexandre" etc. The practical solution is just to leave off names like "Alexander" and "John" (or for superusers, now it's possible to enter a PID.) Needless to say, this needs improvement. |
|
Aug-11-14
 | | chessgames.com: I've always been inspired by an almost AI-like knowledge based system that could parse searches based on statistical knowledge. For example whether you type in "Hou Yifan" or "Yifan Hou" it should know very well what the surname is, and even know that it's a Chinese player. It could learn this data in FIDE or even outside of FIDE. At the same time, gather information like "Alex = Alexander = Alexandre". Possibly even typo data like "Alexadner" = "Alexander". In the end, we're easily talking about millions of records. This is the technology that you see on typical Google searches. When you type a name, they know it, and they'll even correct your spelling for you. Having that tool in the toolkit would open the doors to even smarter searches, like EZ Search could almost become natural langue. It would have a "thought" process like: "obviously you're talking about Fred Reinfeld playing the Ruy, not Ruy Reinfeld playing the Fred." Right now, if it ever seems that clever, it just got lucky. Anyhow, that's the pie-the-sky dream. For now I'd be happy just with small improvements. |
|
 |
 |
ARCHIVED POSTS
< Earlier Kibitzing · PAGE 731 OF 1118 ·
Later Kibitzing> |
|
|
|