|
< Earlier Kibitzing · PAGE 14 OF 18 ·
Later Kibitzing> |
| Jun-13-15 | | zanzibar: The rounds are a little funky here as well:
World Championship Candidates Final (2011)
But the fix is in!
https://zanchess.wordpress.com/2015... |
|
| Jun-14-15 | | zanzibar: <Multi-tournament tournaments> I've been grappling with this for a little while now. What's the best way to split up tournaments while still being able to accommodate some super-structure? The quandary is apparent in the lack of precision in our language. (E.g. the Fischer games at Sousse. Were they part of the tournament? No, not according to the official score. Yet they were played at the tournament. ) Additionally, most people who download a "tournament" will want all the games - playoffs, and dropouts, included. The dilemma is the need to have a unique identifier for E/S/ED, and some other system for sets of tid's comprising the entirety of the tournament (even allowing for off-hand games, etc.). What to do?
Especially given that we have so many tournaments needing <partitioning>. (Yes, another linguistic invention/contrivance)
* * * * *
<Proposed Solution>
A potential solution is to make the tid compound.
<new_tid = old_tid.nn> nn = 0 always denotes the main tournament, and maps to (E0/S0/ED0). nn != 0 shows up as a separate tournament in SCID/ChessBase. It can be whatever organization structure is needed (a playoff, a collection of off-hand games, even a simul). It maps to (En/Sn/EDn). Generally En will be recognizably similar to E0. Sn = S0 in general. We can debate whether EDn = ED0. If a new_tid is lacking the full qualification, it will be assumed to be tid.0. The complete set would specified with tid.* notation (or maybe tid.-1 if a definitive number is needed). The main advantage of this system is that once a tid is registered, biographers would be free to partition the games as needed, without consulting <CG> until the final upload. |
|
| Jun-14-15 | | zanzibar: If I could make a wish-list for <CG>: <(1) Renormalize all the tournaments.> That way <Tab> doesn't have to send hundreds of individual correction slips in. So a bit of a priority. <(2) Allow updating/correcting of tournaments via bulk submission.> We can spend a lot of "fun" time discussing algorithms, but the majority of biographers will still do things by hand. To facilitate and encourage their work, we should allow them to submit corrections however they like to make them. We might need to retrain people to download tournaments from <CG> and apply updates locally (like adding Dates and Round numbers with SCID or ChessBase or even an editor!). The idea is simple, download the tournament, fix it up nicely, and then resubmit it. <(3) To facilitate <CG>'s uploading of a corrected tournament, I'd like to see the <CG_id> included in each game of a download.> * * * * *
Let's face it, there are few people who are going to master the usage of the a big software library to correct the games. Some maybe, and <CG> too. But most no. And the need won't be there either, once we get over this hump and wipe the database into correct shape. After that it's back to one tournament at a time.
And if we focus on the 3 items above then we've developed tools that are useful both now and forever (i.e. after all the fixings been done). Plus, I get down to the business of fixing the tournaments without having to explain all the steps involved (much as I like to bovinate/codevinate). |
|
| Jun-14-15 | | zanzibar: Some examples of proposed bulk "fixes":
<CG-CM-update>
https://drive.google.com/open?id=0B... <CG-update (6-tournament sample)>
https://drive.google.com/open?id=0B... |
|
| Jun-14-15 | | zanzibar: <RE: pastebin>
I looked at it again.
I'm not sure if I have to register to leave content on its site or if it automatically remembers a session or ?? I'm disinclined to register at yet another site. Besides, I'll just post on my blog most times, as is my wont(*). I actually didn't like pastebin's syntax highlighting. But I did like this guys: http://tohtml.com/python/
especially the FMX style.
And wordpress allows pasting the html-output in directly (in the text screen): https://zanchess.wordpress.com/2015... (OK, I still had to color-code the comments by hand)
(*) http://www.etymonline.com/index.php...
(The guys not right in the head, footnoting his own posts!) |
|
| Jun-14-15 | | zanzibar: <RE: pastebin>
I looked at it again.
I'm not sure if I have to register to leave content on its site or if it automatically remembers a session or ?? I'm disinclined to register at yet another site. Besides, I'll just post on my blog most times, as is my wont(*). I actually didn't like pastebin's syntax highlighting. But I did like this guys: http://tohtml.com/python/
especially the FMX style.
And wordpress allows pasting the html-output in directly (in the text screen): https://zanchess.wordpress.com/2015... (OK, I still had to color-code the comments by hand)
_______________________________________________
(*) http://www.etymonline.com/index.php...
(The guys not right in the head, footnoting his own posts!) PS- Well look at that! <CG> don't truncate "___" but it does "...". Rather curious. Guess ya learn something new everyday. |
|
Jun-14-15
 | | chessgames.com: I use pastebin all the time and I never registered. Just paste, push the button, grab the URL. Simple as that. |
|
Jun-14-15
 | | chessgames.com: <zanzibar> I wrote a program called "roundfixer.pl" based on your logic. I sort a tournament by date, then group them so that all of the games from the earliest date go into an array and get set to be round #1, then all of the games from the next day go into an array and get set to round #2, etc. However your example tournament 5th American Chess Congress (1880) I find a bit perplexing, unless there's a detail I don't understand. Let me explain. It has several safeguards. For starters, it will check to see if it's actually making a change. If it already says [Round "1"] and it believes it is round 1 it won't bother making the change. Next, it will refuse to change data that is already there. If the PGN says [Round "5"] and it has determined it should be round 4, it doesn't just silently change the data. It throws a warning. Finally, it doesn't recognize any round that has fewer than 4 games. This way the playoff at the end of the 5th American are not blindly assigned to be "round 19". A game replayed from a previous round on an off-day would also not trigger a new round number. When you put all those safeguards together, you see it really is out to fix [Round "?"] and [Round "-"] or completely bizarre things. If something sensible is already in the round field, it will simply make a note of it. So we're looking for the odd case of a tournament with horrible round information and wonderful date information. Now here's the problem: there is no missing data in this example tournament. There are no [Round "?"] in this event. What I see is that there are two days per round, as it seems. Just for example A Cohnfeld vs Mackenzie, 1880 and J S Ryan vs C Moehle, 1880 are both labeled as [Round "1"] although the first was played on January 6th and the next was played on January 7th. Are you claiming this is incorrect? I think somebody took great pains to make it exactly like you see it. If it weren't for my safeguards, my software would have renumbered the rounds from 1 to 18 based on the dates. As things stand, it is 9 rounds in groups of 2 days each. The output of my software is here: http://pastebin.com/8DSK5PyM |
|
| Jun-14-15 | | zanzibar: OK <chessgames> I'll give it another try. Let's say I read in the renormalization data from a tab separated file (tid, Event, Site, EventDate)
and have the CG_id's in the PGN data.
Here's the renormalization loop:
http://pastebin.com/xzybHEHv
(I got confused with the Download / Embed / Raw stuff) |
|
Jun-14-15
 | | chessgames.com: OK, rereading your previous comments I see now, you are of the opinion: <It's a RR-2 (aka double Round Robin), and the two halves in the original collection are better thought of as two different rounds.> This has suddenly turned into a Biographer Bistro issue more than a data maintenance issue. Somebody numbered them that way for a reason. Perhaps the tournament bulletin numbers them that way. Do others share your opinion on this? |
|
| Jun-14-15 | | zanzibar: <chessgames> Just reading your other post now. I think the <Round Dating> examples I wanted to use were all <Candidates Master> games. I'll have to go back and review the details of your post as applies to the early American tournament. But yes, the idea was only to update the <[Round "?"]> tags. That would never overwrite good data. OK, I'll be back in a bit... |
|
Jun-14-15
 | | chessgames.com: roundfixer.pl takes a TID as an argument. So just give me more TIDs to test it on. Right now, the "safety is on", so it can't actually change data, but it will report on what it would have done were the safety off. I can share its output on pastebin. |
|
| Jun-14-15 | | zanzibar: OK, that's easy enough to do...
try any one (or all) of these:
62003, 74009, 62004, 61997, 61998, 61999, 62000, 62001, 62063, 80070, 62061, 62062, 62064, 62002 |
|
| Jun-14-15 | | zanzibar: OK, I see the confusion on <5th American Congress> Part of this it is that there are no <R?> rounds. So, using the <Round Dating> technique would require explicit over-riding of some safeguards. I should have made that clear.
But let's consider some games from R1 to show everybody what's going on. @g1255655 and @g1255660
<01.06 29 (R1) 0-1 Cohnfeld -- Mackenzie> and
<01.07 45 (R1) 1-0 Mackenzie -- Cohnfeld> Two difference days, same round, opposite color pairing. Obviously these two games belong in difference rounds, unless there is some compelling reason not to (and even then!). The intro shows a RR-2 xtab, so the game data is out of sync, the rounds should also show a RR-2 with 18-rounds, and not a RR with 9-rounds (and double-counting). There's nothing special mentioned otherwise in the intro about the rounds - so I maintain renumbering the rounds is "biographer safe". Before going on though, it should be noted that there's three dates for R1 games. There is also a 01.31 game: <01.31 37 (R1) 1-0 Mackenzie -- Grundy> The intro does shed some light on that:
<The final of the event saw a tie for first between Mackenzie and Grundy. The Congress rules stipulated
in the event of a tie that a playoff match ...> The tournament isn't properly <Partitioned>. Now, if the <Round Dating> were applied, before proper partitioning, the play-off games would go to R19 and beyond. Which actually is a better place for them then R1. When properly partitioned, as a head-to-head match, <Round Dating> would restore them to R1 and R2 (from R19 and R20). I hope this makes it clearer. |
|
| Jun-14-15 | | zanzibar: Here's where the RR-2 statement comes from:
CG Librarian chessforum (kibitz #314) I have to review it...
* * * * *
But exactly yes on this comment...
<This has suddenly turned into a Biographer Bistro issue more than a data maintenance issue.> Which is why I'm so busy on the Bistro. I don't think many others have realized this, but it's good that I'm not entirely alone. There is various points where Biographer must decide what input goes in this "grand" program. As for the rest of the comment, I hope the previous post addresses the concerns in a helpful fashion... |
|
| Jun-15-15 | | zanzibar: Here is a simple example of <Partitioning>... <Rice Memorial (1916)> https://zanchess.wordpress.com/2015... The intro shows two xtabs, a prelim and a final. The above shows how to do the same with the PGN. |
|
| Jun-15-15 | | zanzibar: After <Partitioning> comes <Stubification>: <Rice Memorial (1916)> + <Rice Memorial Final (1916)> https://zanchess.wordpress.com/2015... Adding stubs = tournament completion.
All xtabs are then accurate, as are the leader boards. Stubs account for scoring byes, forfeits and missing games (where the result is known (and hopefully the colors as well)). The above post is technical, and has the python code to prove it! |
|
| Jun-15-15 | | zanzibar: At some point there must be a finish to all this tomfoolery, and so there is: https://zanchess.wordpress.com/2015... This post shows the results. It's not quite so technical (no python), but it does lean on SCID quite a bit. Nothing the reader can't handle, I hope! |
|
Jun-15-15
 | | chessgames.com: <zanzibar: OK, that's easy enough to do...
try any one (or all) of these:
62003, 74009, 62004, 61997, 61998, 61999, 62000, 62001, 62063, 80070, 62061, 62062, 62064, 62002> Very good! After testing the first couple I got mixed results. I think I'm ready to "take the safety off" and let it actually fix some rounds. I'll report back later. |
|
| Jun-15-15 | | zanzibar: OK great. What's that German saying... maybe this -
Every great journey begins with but a single step. |
|
| Jun-15-15 | | zanzibar: Hey, wait a sec, what does this mean...
<After testing the first couple I got mixed results.> ?? As in good mixed results?
(I'm on the road again, but I'll check back later) |
|
Jun-16-15
 | | chessgames.com: OK here are some results:
First test case: Candidates Match: Polgar - Bareev (2007) output: http://pastebin.com/s4Kb8nWP
This is a success. A typical line reads <updating gid 1462085 from [Round "?"] to [Round "3"] - FIXED>, so we've replaced question-marks with perfectly sensible data. (Even then, I am trusting that you wouldn't give me an incomplete tournament to run this on.) OK next test case, World Championship Candidates Final (2011) output: http://pastebin.com/5vv4E5fN
Here things didn't go so well. There are two wonky tournament rounds, but the software refuses to change data that already exists. So we get errors like <updating gid 1622774 from [Round "3.1"] to [Round "1"] - ERROR: will not overwrite existing data>. And for the other rounds, the data are correct so it says <updating gid 1622878 from [Round "6"] to [Round "6"] - (no need to fix)>. Now for Candidates Match: Shirov - Adams (2007) output: http://pastebin.com/z5ekDtt6
Rounds 1-6 got fixed easily as they should have, but then we have three games played on June 3 '07 all labeled [Round "7"]. I don't know the details but it sure seems like that must be a rapid playoff or something. It's beyond the scope of this program to meddle with things like that. So this one was a partial success: it fixed some question marks and left the playoff games for editors to correct. I ran it on Candidates Match: Aronian - Carlsen (2007) and I'll spare you the output, but it's almost the exact same case. For some reason there are 6 games all labeled [Round "7"] which the program refused to touch. However a bunch of question-marks were replaced with round numbers. Candidates Match: Leko - Gurevich (2007) and Candidates Match: Ponomariov - Rublevsky (2007) both were corrected nicely. Example: <updating gid 1462051 from [Round "?"] to [Round "2"] - FIXED> At tis point I'm going to stop, although running it by TIDs 62000, 62001, 62063, 80070, 62061, 62062, 62064, 62002 should still be on my to-do list. Conclusions/observations:
First, the reason why the round numbers are so spotty to begin with simply must be that the official site was producing this stuff at the time. Especially when a site concentrates on live broadcasting of games, the "invisible" fields like Round and EventDate often go unattended. So now we have a handy utility to change a number of games at the same time. Unfortunately it's not something so automated that it can run without careful supervision. All of the cases I just tackled could have been easily fixed by hand, however it's easy to imagine a much larger tournament where this could save hours of work. One more tool for the toolbox—not bad. |
|
| Jun-16-15 | | zanzibar: Hi <chessgames> - I understand this now. What is really going on here is that the separate 2007 <Candidates Matches> are really part of a Knockout tournament. So the weird round number 3.1, 3.2 are because <CG> took a knockout round and made it look like a head-to-head match. And <CG> didn't do it completely consistently, as some rounds just came in as "?' rounds. And yes, the roudn 7,8,9, etc. rounds are numbered this way because they are rapid/blitz/armageddon rounds from the same day. Lord knows why R1-6 are notated as "?". Maybe from the original site - but I would think it more likely that it's <CG>'s doing. In fact, in my final working of the tournament I redid then to correspond to the proper knockout structure. So <Polgar--Bareev> is R1.1-R1.6, and <Leko--Bareev> is R2.1-2 is all OK. The extra round, if round != '?' are the rapid/armageddon playoff rounds. Which number 7 and beyond, and they are marked correctly. This stuff is must easier to visualize if you have a rr() routine (rr = round report). rr(t) displays all games in a tournament, sorted by round. rr(t,n) displays round n in a tournament.
This is a mandatory tool, and if you have it, then the cases where Round Dating routin can be used will be obvious. |
|
| Jun-16-15 | | gauer: Along with chess display container schemes like the Viewer Deluxe chessforum (kibitz #293) interface, Winboard (which I hadn't used much) and UCI (seems to be a bit more modern, and was the one I called an engine into when using chessbase or deep fritz on windows - not sure what the Mac/Unix users prefer - maybe old versions of chessmaster (ie: 5000, etc) had their own versions) were common interfaces back then. <Regarding eventdate orphans> chessgames.com chessforum (kibitz #22832) . When I would add some games, my cross-table generation program (chessbase 9 mega) seemed to have handled things more well for building scheveningen or Olympiad or team match-point tables when some "team" tags were added to the 7-tag roster than not. Often, I didn't care much about eventdate issues, and would sometimes strip them when generating them from notepad (ctrl-F is a great way in here or in excel to get a number of players count in there or via excel, or for repairing names via the replace tool) after chessbase before checking them for pgn upload util. when I didn't need them. The pgn should be able to primarily generate a table in another program, if/when it is not a swiss/RR in my opinion (which is why I also pick on how chessbase interacts with pairing algorithms - not well, as far as I could tell back then, as FIDE is only supposed to generate about 1 "correct" pairing for a tournament round, or so they supposedly teach in FIDE arbiter school). To me, the eventdate info is sometimes better handled in the biography window box, or perhaps embedded into part of a tid template page field (which in some cases, biographers can edit - see the public available admin manual link on the bistro). The problem with adding in <extra> tags to a 7-tag roster in the pgn source page is that which versions of Xpgn is one to choose to use? Programmers are sort of slow to conform to having the extensions match or play well with that of another programs Xpgn output generations. When chessgames.com does export a pgn file of a tid, what do they want to add as common extension roster tags when a tid pgn is not a swiss? In some cases, it might be best in user prefs to have a switch to turn on/off to export more or less than fields of a 7-tag roster field from a pgn to a program source on your offline computer. I'm playing a tournament again this weekend (Guelph), so don't be surprised if I'm still catching up on this week's reading next week. |
|
| Jun-16-15 | | zanzibar: <gauer> you and I both sometimes both write dense technical posts, but I think you got me beat! There are a million issues in your last post. Some kind of outline is needed just to help guide a reader wanting to process it all. Let me break down just a point or two to reply, as I work through your post. You reference this previous post:
Viewer Deluxe chessforum (kibitz #293) which in turn contains several diverse topics, including, but not limited to: 1) Hiding GotD moves
(E.g. SCID has an option to do this, outside the PGN spec) 2) Variations in the PGN
This is a standardized feature.
There aren't many examples of <CG> games with variations that I know of. A list of them might be helpful. I do know we can search for games with annotations, but I rarely use that feature. You give this game:
http://www.chessgames.com/perl/nph-... Which I think contains illegal PGN. The standard explicitly states that comments don't nest. The example has some weird variation essentially consisting of a comment - which I think was written by hand by someone unfamiliar with the PGN standard. Here's the variation/comment: <({prior to 33 ... g5, black tests 33... Rf3+ 34. Kg2 transposes}) > The variation is set up for White's 34th move. So adding a after the 33...Rf3+ would make the PGN legal, and seems to have been the intent. But who can say?
SCID parses it with no ill effect, and apparently CVD didn't realize it was illegal PGN, since his CVD also parsed it as an empty variation with comment. There are other examples from your other post, let's talk about those later. I think CVD addressed some of the concerns you raised. Let's get back to this to finish for the moment:
<Along with chess display container schemes like the Viewer Deluxe chessforum (kibitz #293) interface, Winboard (which I hadn't used much) and UCI (seems to be a bit more modern, and was the one I called an engine into when using chessbase or deep fritz on windows - not sure what the Mac/Unix users prefer - maybe old versions of chessmaster (ie: 5000, etc) had their own versions) were common interfaces back then.> First of all, I think you need a topical sentence to begin, telling us what your focus is, and why you are providing all the details. I understand it to mean that you're talking about various display program people can use. The question is why? So I actually have to read ahead to see where you're going, and then go back to reread again with the context. Let's put it this way, reading each first sentence of a post such as this should provide an outline of what you're talking about. (Apologies for sounding like a schoolmarm, but I think your ideas and concerns worth the effort). The rest of your post goes back into the PGN issues and is easy to understand. So my comments are more directed to the first paragraph. So, yes, there are a variety of viewers out there. I assume we are talking about how they deal with PGN. First, though, just a point of fact, you mention UCI as if it's connected to viewers or PGN. It's not. UCI is a communication protocol for talking to a chess engine. Basically, all chess engines use it today, with the exception of Crafty. I believe Crafty uses something called Winboard protocol (update- actually Xboard/Winboard protocol) - it predated UCI, and Hyatt couldn't be bothered to retrofit his engine. You can't do MPV's with Crafty, etc. Since Crafty is no longer state-of-the-art, most modern viewers just go with UCI engines, which might have led you to the impression that UCI was a display scheme. It's more accurate to say "modern display schemes (w UCI)". I think WinBoard and SCID will do both engine protocols. http://www.gnu.org/software/xboard/... (Winboard) http://wbec-ridderkerk.nl/html/UCIP...
(UCI comes from Shredder's Stefan-Meyer Kahlen btw) |
|
 |
 |
|
< Earlier Kibitzing · PAGE 14 OF 18 ·
Later Kibitzing> |
|
|
|