ARCHIVED POSTS
< Earlier Kibitzing · PAGE 825 OF 1118 ·
Later Kibitzing> |
Jun-01-15
 | | Annie K.: Heheh...
Nice detective job, too. :) |
|
| Jun-01-15 | | zanzibar: <chessgames> having a normalized, indexed, collated version of the <CG> tournament collection is very useful for bug-hunting. I'd like to share this with other biographers, but thought I should get explicit permission first. I'm intending on sharing the SCID version I just created, which consists of ~45 Mb of data, spread over three files. I haven't finalized the format of index values, at the moment it's <[CG_id "tid.gid.white.black"]> where white/black are the pid's. All are numbers.
Text searching isn't entirely clean, so I'm debating using 't/g/w/b/ prefixes (and adding an ending '.' for symmetry). Parsing-junk. I'm also considering adding a final number to enumerate the game without the tournament sequence - and maybe also the N_games in the tournament too. But that's the end of potential refinements.
As it is now it's truly wonderful to work with - everything being normalized <Tournament Finder> is clean and xtabs are all meaningful. I could submit to you for publication - but I'd like it to be available to all biographers, given it's utility. The next step is regularization, in terms of evolutionary progress! Stay tuned. |
|
Jun-02-15
 | | chessgames.com: <I'd like to share this with other biographers, but thought I should get explicit permission first.> So this is a file of PGN or in some other format? And it's reprocessed version of our own PGN, with some new tags introduced? About the single compact tag, would't it be more in the spirit of PGN to embed a number of new tags, e.g. [ChessgamesTouranmentID "82762"] etc.? |
|
| Jun-02-15 | | zanzibar: The indexing could be more verbose.
My preference is for the more compact form, with just one extra line of PGN/game. To search for a specific PGN tag with SCID isn't possible. It's header search uses standard tags, but allows for a general text search on the entire PGN. The advantage of the more verbose tag is that the search could include the tag, and avoid possible collisions if searching on just the number (which was why I wasn't 100% settled on the format). The games are PGN oiginally, but using SCID compacts the data to 25% of it's size. (From 200 Mb -> ~45 Mb for ~250k games). You can easily output all the games in PGN using SCID, or input the games as PGN and then convert to SCID format. |
|
| Jun-02-15 | | zanzibar: I should mention that the CG id becomes a python tuple, which is possibly one reason for the compact notation - the input format is very close to the output format. E.g.
<
[Event "FIDE Grand Prix Khanty-Mansiysk"]
[Site "Khanty-Mansiysk RUS"]
[Date "2015.05.16"]
[Round "3"]
[White "Caruana, Fabiano"]
[Black "Tomashevsky, Evgeny"]
[Result "1-0"]
[WhiteElo "?"]
[BlackElo "?"]
[ECO "D15"]
[EventDate "2015.05.13"]
[CG_id "85650.1791987.76172.49039"]
[PlyCount "81"]
1.d4 d5 2.c4 c6 3.Nf3 Nf6 4.Nc3 a6 5.a4 e6 etc.
> |
|
| Jun-02-15 | | zanzibar: So, it's <CG> pgn, with possible annotations stripped out, and the <CG_id> tag added. |
|
Jun-02-15
 | | chessgames.com: OK, fine, go ahead and share.
By the way, the seasonal update of the Zipfile Archive takes place tonight. |
|
Jun-02-15
 | | Tabanus: <1751663, 1751664, 1751666, 1751667, 1751668, 1751669, 1751671, 1751672, 1751673, 1751674, 1751675, 1751676, 1751677, 1751678, 1751679, 1751680> Perhaps all these games had "alternate scores", and the wrong game was deleted? If that's so, maybe there's not any 'bug' but a 'bugger'. Or you are hacked! Or the software deleted an old duplicate for a "new" reason? I'm not sure but think that all the problem games in "my" voted-in tournaments had alternate scores. Just trying to contribute with my chisel. |
|
Jun-02-15
 | | Richard Taylor: What a wonderful display of numbers. Just what the doctor ordered! I still cant always even get the default pgn board to play games, so I download the PGNs and play them on Winboard. But, I know <chessgames.com> are deeply concerned about me so I will reassure cg.com that I will leave it till I get a newer computer etc. Meanwhile I have to fix my fence. Then I will fork out for said comp. This information I feel is vital for the smooth operation of cg.com and the number problems you are getting...It may help to reassure those who are concerned about whatever it is concerns them. |
|
| Jun-02-15 | | zanzibar: <Tab> what's the context of that list? It must be connected with <56th US Open (1955)> Also, what's the significance of 1751665 and 1751670 being missing from the sequence? Fedorowicz vs R Eberlein, 1976
A Hodges vs H H Hahlbohm, 1920 |
|
| Jun-02-15 | | zanzibar: <chessgames> Heartfelt thanks... I'll put my v0.0 prototype out on google drive then. I assume the updated ZipFiles don't have the id's in them? |
|
Jun-02-15
 | | Annie K.: My totally uninformed guess would be that the tournament games were submitted one by one, and someone else, or two different people, meanwhile submitted these two unconnected games. Is there a prize? ;) |
|
Jun-02-15
 | | Tabanus: <z> See CG'ss post on the previous page, it's the games that have <US Open> instead of <56th US Open>. |
|
| Jun-02-15 | | zanzibar: <AnnieK> As far as I can tell, you're a prize - in the nicest way! <Tab> Thanks, I'll bounce back and look. |
|
Jun-02-15
 | | Annie K.: <zanzibar> well, you can't go very far wrong with comments like that. ;) |
|
| Jun-02-15 | | zanzibar: More data to look at, for <56th US Open (1955)>: https://zanchess.wordpress.com/2015... And another new routine in the toolbox. |
|
Jun-02-15
 | | chessgames.com: Getting back to a comment by Zanzibar chessgames.com chessforum (kibitz #22620), there are some players in our database with FIDE ratings and no FIDE ID. Now that it's the start of a new month it's time to process the new FIDE rating file, and this time the software is going to alert us to such situations and automatically supply the FIDE # where it thinks it belongs. The first player to get assigned a FIDE number this way was Robert Ackermann. However as the biography indicates, there are several Robert Ackermanns in the world of chess, so this could actually be a mistake. The next one was Konstantinos Anagnostopoulos, a player with only one game, who we determined to be FIDE #4251067. That one is likely to be correct. Then we come to Viktor Andreev who played one game in 1981 and we are guessing this is FIDE #4133277. That's a pretty common name so it's a coin-flip if this one is correct. Might be the same guy, might not. Next I come to Zdenek Bartonicek who, just like Ackermann, is noted that there are multiple players with that name. Next crops up Klaus Beckmann, the biography reads "Four players with the same name." The software slaps FIDE # 4604334 on the record. Next Petr Benes, the biography indicates that there are two of them. Last one: Alexander Bochkarev. The biography indicates that there are two players. OK, end of experiment.
There's a pattern here and it's not clear to me that the best thing to do is to run through the list and assign FIDE numbers to these pages. Perhaps first the pages should be fixed, by separating them out into two or more pages, and only then assign the proper FIDE numbers to each one. I'm going to stop the software from making these changes now because I'm just not sure if I'm doing anything useful by adding this information. One could make the argument that the software should be stripping the probably-bogus rating information from these players, as opposed to adding the probably-bogus FIDE numbers. You can also make the argument that sometimes to get something right, you have to start with something that's wrong and fix it. And so we should populate all of these pages with FIDE numbers and then let the editors correct from there. I'm on the fence with this. Perhaps the Bistro has an opinion; they are the people who put the FIDE numbers in to begin with, I don't want to muddy up their work without their feedback on what's happening here. (If need be, we can also run through the names above and strip out FIDE numbers.) In any case I don't see it as a critical issue, as it only affects relatively obscure players. |
|
| Jun-02-15 | | zanzibar: <chessgames> I'd have to go back and refresh my memory on the details. But I'm fairly sure I skip over all cases of doing automatic updating of the FIDE number if name degeneracies exist. I also think having a person reviewing the proposed updates is almost mandatory, even for the automatics. But one idea I think I had at the time was to funnel the updates to the Bistro for approval. Maybe that would be too big a flood of input, in which case you could just make a page for biographers to look upon at their leisure. Which kinda fits in with your next to last conclusion: <I'm on the fence with this. Perhaps the Bistro has an opinion; they are the people who put the FIDE numbers in to begin with, I don't want to muddy up their work without their feedback on what's happening here. (If need be, we can also run through the names above and strip out FIDE numbers.)> It's not a critical issue, but matching a player to a FIDE id is a "good thing", for sure. When you get tournament uploads from a FIDE sponsored tournament these days, the FIDE id is right in the PGN. So, doing the FIDE id sooner than later will ultimately save work (else you dump all the players with the same name into one bin for us to sort out later - ugh!) |
|
Jun-02-15
 | | chessgames.com: <It's not a critical issue, but matching a player to a FIDE id is a "good thing", for sure.> Of course. The question is, if we have a player page which muddles up a father and son, or two people with the same name, should we slap a FIDE # on it, effectively at random, in hopes that it will be sorted out later? Or maybe we should intentionally keep the FIDE # off of that page, as a reminder that there is something that needs the human touch. <When you get tournament uploads from a FIDE sponsored tournament these days, the FIDE id is right in the PGN.> I know what you're saying. If we got some PGN with a Robert Ackermann with one FIDE #, and then a different tournament with a different Robert Ackermann and a different FIDE #, we have all we need to create two coherent player pages. As things stand we just toss all such games on the generic Robert Ackermann page. Like I said, this only happens with the relatively low tier players. It's been a long time since we assigned a Sergei Karjakin game to Sergey Karjakin. |
|
| Jun-02-15 | | zanzibar: <chessgames> you probably know this, but maybe other readers don't... Suppose you have multiple versions of a player, and know that at least one of them matches up with a FIDE player. How does one disentangle them?
One technique I've learned is to use the rating history provided by FIDE to find rated tournaments, and then match <CG> player tournament history to FIDE's. E.g. <Ackerman>'s rating history is documented here: http://ratings.fide.com/hist.phtml?...
Look at <CG> and see games from 2004, 2010. The 2004 looks good, 20th European Cup, bound to be on FIDE. Remembering that ratings are submitted after tournaments, means not matching the 2004 entry isn't the end of the story. Look at the Jan 2005 FIDE calculations:
http://ratings.fide.com/individual_... And viola. The 20th EuroCup is there - we have a confirmed match. * * * * *
The idea of the automated process is to facilitate getting the work done. It's like auto-pilot, doing most of the grunt work, allowing the critical part to be done by the pilot guiding the plane in for the landing. |
|
| Jun-02-15 | | zanzibar: Right. Just reading your other post.
The idea is to set up the tools to make it as easy as possible, but with maximal reliability. Getting a player right is of such importance, the latter is stressed here. But some of these players, low level or not, have been around for years. I'm advocating "getting the job done".
With the right tools it's not that hard, or laborious (ie time consuming). And the truly hard cases... just kick 'em over to the Bistro. I think, given what I've seen involving determining exact dob/dod's, we'd be equal to the task. You could always quota them, if you have too many saved up. Hey, you have a <Player of the Day> feature already. Why not an <Unknown Player of the Day>?!? |
|
| Jun-02-15 | | crawfb5: <Why not an <Unknown Player of the Day>?!?> The Collection of the Unknown Player? |
|
| Jun-02-15 | | zanzibar: Or - The Tomb of the Unknown Player? |
|
| Jun-02-15 | | zanzibar: Let's do Anagnostopoulos.
I'll show my code:
<
def pf(s):
... for f in [ f for f in FIDE.F if s in f.name ]: print f >
And look for players with the last name, first initial: <
>>> pf( "Anagnostopoulos, K" )
4251067 Anagnostopoulos, Konstantinos (1995) GRE 2137 / 10 4214773 Anagnostopoulos, Konstantinos (1945) GRE 0 / 0
>
There's two of them, but only one with a rating, let alone active in 2011. QED. |
|
| Jun-02-15 | | zanzibar: It's worthwhile to do Andreev (@p147107), as it shows an example where it should be kicked over to the Bistro. <
>>> pf( "Andreev, V" )
4128362 Andreev, V.V. (1978) RUS 2085 / 0 4133277 Andreev, Viktor (1957) RUS FM 2174 / 0 34158569 Andreev, Viktor B. (1950) RUS 2196 / 0 34120316 Andreev, Viktor V. (1960) RUS 0 / 0 + 9 more clear non-matches.
>
The player we're looking for, has only one game/tournament: <1981 Magnitogorsk> That eliminates V.V. (unless he played in diapers), but none of the other three. We can eliminate the unrated player. So we really only have two: <
4133277 Andreev, Viktor (1957) RUS FM 2174 / 0 34158569 Andreev, Viktor B. (1950) RUS 2196 / 0 >
The FIDE tournament records, unfortunately, only go back to 2001. The ages are too close for comfort. It is likely the titled player, but in this case the finishing touches call for Bistro expertise (imo). Let me see if the others have "educational" value... |
|
 |
 |
ARCHIVED POSTS
< Earlier Kibitzing · PAGE 825 OF 1118 ·
Later Kibitzing> |
|
|
|