|
< Earlier Kibitzing · PAGE 13 OF 18 ·
Later Kibitzing> |
| Jun-12-15 | | zanzibar: Anyways, there's this story about Hendrix going on stage at Monterrey Pop... <... Hendrix and Pete Townshend (The Who) almost got into a fight over who was going to perform their act first. So, a coin was flipped, Townshend won. Hendrix then vowed to pull out all the stops - and promised to put on some of the most outrageous & mind-blowing theatrics. The rest was history.> http://www.amazon.com/Jimi-Plays-Mo... So full-bore, no-stops, posts go here.... |
|
Jun-12-15
 | | chessgames.com: Set the database on fire, Jimi :) |
|
| Jun-12-15 | | zanzibar: Let me show the python code, then if it's not clear enough leave a note and I'll follow-up. TN - lookup tournament name via tid
TG - get games via tid
rr(t) - print out all games sorted by round
#
# Find Candidates Matches
tCM = [t for t in TN if 'Candidates Match' in TN[t] ] #
# Print out and review (not shown)
for t in tCM: rr(t)
# Oops, <Candidates Match: Aronian - Carlsen (2007)> shouldn't have non-? round #'s overwritten, ok got it. for t in tCM:
... # Get the tournament games sorted by Date
... G = TG[t]
... G.sort( lambda g1, g2: cmp( g1.Date, g2.Date ) ) ... rnd = 0
... date = ""
... for g in G:
...... if g.Date != date:
...... ... rnd 1
...... ... date = g.Date
... if g.Round == '?': g.Round = 'tst-' + str( rnd ) # Note I put "tst-" prefix in, for debugging. We'll rerun rr() to make sure all is ok,... for t in tCM: rr(t)
#...then do this:
for t in tCM:
... for g in TG[t]:
...... if g.Round[:4] == "tst-": g.Round = g.Round[4:] * * * * *
The last little bit is just showing off a little, but of course, one has to be careful not to crunch the game data. Oops, no <pre> tag. See here: (has to wait, I"m on the road and connection doesn't allow blog posts) |
|
| Jun-12-15 | | zanzibar: So, in words... many of the <Candidates Match> games are missing round numbers. They are easy to fill in, since the games are head-to-head sequential match games, most always one/day. Give me a day in order, and I can give you the round number. The trick is to avoid the special cases (e.g. <Candidates Match: Aronian - Carlsen (2007)>, etc. |
|
Jun-12-15
 | | chessgames.com: I'll look this over soon and comment. You might want to use http://pastebin.com to avoid formatting issues. |
|
| Jun-12-15 | | zanzibar: By the way, we can play a similar game to automatically normalize the <EventDate> header in certain cases: for t in tList:
... G = TG[t]
... G.sort( lambda x,y: cmp( x.Date, y.Date ) )
... bracket = [ G[0].Date, G[-1].Date ]
... # Be super-safe, require full bracket and missing ED. ... if '?' in bracket[0] or '?' in bracket[1]: continue ... if '?' not in G[0].EventDate: continue
... for g in G: g.EventDate = bracket[0]
* * * * *
Viola, if biographer got all the dates right the tournament should practically normalize itself. In fact the test on the pre-existing EventDate could be strengthened <... and G[0].EventDate < bracket[0]: continue>. Of course, we're assuming all the games have been normalized in the above (e.g. G[0].EventDate gives the EventDate for all the tournament games). |
|
| Jun-12-15 | | zanzibar: OK, let me look at pastebin, its new to me.
I think my blog would be as easy though (my first try with pastebin wasn't 100% successful - but my brain may be a bit fizzed at the moment (forgetting fields in searches, etc)) |
|
| Jun-12-15 | | Benzol: I have the feeling that this forum is going to become a hive of activity very shortly. :) |
|
| Jun-13-15 | | zanzibar: The above technique of <Date Rounding> should be applied to 5th American Chess Congress (1880)
excluding the two playoff rounds.
It's a RR-2 (aka double Round Robin), and the two halves in the original collection are better thought of as two different rounds. Ergo, instead of a 9-round tournament, its really a 18-round tournament. The two playoff games should go into an adjunct tournament like NIC and ChessBase (and others) do. |
|
Jun-13-15
 | | chessgames.com: Pastebin is really easy. Just type (or paste) what you want, and then click "submit", and then whatever URL you land up on is the one you want to share. For instance I just made this test http://pastebin.com/weFbbCZh I've never experimented with syntax highlighting but it has Python and Perl as options, so that's handy. |
|
Jun-13-15
 | | chessgames.com: OK, I understand what you're saying now. I'm not fluent in Python but I think I understand this: <
... for g in G:
...... if g.Date != date:
...... ... rnd 1
...... ... date = g.Date
... if g.Round == '?': g.Round = 'tst-' + str( rnd )
>
To put it into words: "See all those games played on the very earliest date? That's round #1. See all those games played on the very next day? That's round #2. See all those games played on the date after that? They are round 3." Makes it kind of seem dead obvious when you state it like that. Issues like rest-days don't matter, because they have no games played on those dates. <The last little bit is just showing off a little> It was lost on me because I'm not familiar with the [:4] and [4:] notation here <g.Round[:4] == "tst-": g.Round = g.Round[4:]> I can think of some exceptional circumstances where it might run into trouble. Remember the big to-do when Cheparinov refused to shake Nigel's hand? They ended up replaying that game on the rest day. It would be a big mistake to create a new "round" over that one game, but if the software was smart enough it could be careful to sense that something is unusual when one or two games have unique dates from the rest of the tournament. Likewise, the playoffs at the end of 5th American Chess Congress (1880) could be recognized as not possibly new rounds. |
|
Jun-13-15
 | | Tabanus: To be honest, I don't like the idea of having a program decide round number based on date. Why are editors allowed to change round number in the first place, if a program changes it back later? Besides, we usually have the round number but not the date. |
|
Jun-13-15
 | | Tabanus: Check for errors, yes, but making changes, no thanks. If a compiler don't even bother to add dates and round numbers to a collection, it's probably not worth spending programming on it. |
|
Jun-13-15
 | | chessgames.com: <To be honest, I don't like the idea of having a program decide round number based on date.> I share some of those concerns.
<Why are editors allowed to change round number in the first place, if a program changes it back later?> I don't think anybody suggested changing round data that's already there. The idea is to turn [Round "?"] into sensible guesses. <Besides, we usually have the round number but not the date.> That's true and makes this project only useful on a few fringe cases. |
|
Jun-13-15
 | | Tabanus: <CG> Thanks :) Then it's better. Please also remember that rounds are sometimes played in advance, e. g. round 13 is played between rounds 3 and 4, etc. Example: see the note below the crosstable in Gothenburg (1920). |
|
| Jun-13-15 | | zanzibar: Slow down! You guys are getting far ahead of the first case. The first case is for the <Candidates Match> tournaments, where two-player tournaments without all these complications. But if you insist on skipping ahead to a tournament like <Gothenburg (1920)> let me run this scenario by you: 1) Suppose all the games have no round numbers.
2) Apply the <Date Rounding> algorithm 3) Then go back and do by hand the odd-ball cases.
You end up having to do just a handful of cases by hand. Compare that to doing the entire tournament, round by round. That's 91 games to do, not just the 3 Møller + the one delayed R5. 4 vs 91 games to correct by hand.
I know which approach I'd like to take.
Remember, it's a tool, in this case to fix the games already in the database. It's supposed to be supervised by the administrator (see next post). And, I'd like to point out, doing a correction "by hand" is really also doing the correction via a program. In this case by some code fired off by data input on a HTML page. As for being just fringe cases - yes, it only a few cases. But it's the bulk of the <Candidates Match> games, 13 tournaments with 89 games. A big enough job that its worth spending a little extra time and effort to get <CG> to create a new tool for the toolbox. |
|
| Jun-13-15 | | zanzibar: <Python aside>
Python is a great scipting language. Very regular yet expressive. One of its most powerful features is that of slices - which turn a list (or tuple) into a subset of itself. Suppose -
x = [1, 2, 3, 4, 5] # Python list
then x[0] == 1
and x[3] == 4
Zero-based indexing.
But you can index from the end too!
x[-1] == 5 and
x[-2] == 4 are true.
Now for slicing, you can slice from a position to the end (or the beginning): x[1:] == [2,3,4,5] and
x[:-2] == [1,2,3]
Of course you can explicitly slice from the middle:
x[2:4] == [3,4]
just remember the 2nd index is one beyond, so that len(x) works: x[0:len(x)] == x
where len(x) gives the number of items in the list.
Of course there's better places to learn Python than a <CG> forum! But here we be.
As for the previous example I tagged the edited rounds with a prefix, "pref-" or whatever it was. The reason was to review the changes before commenting the update. I do this a lot when I work. I want to see what I'm doing, and if I screw-up I end the session wo updating, and start a new one. Previously, I didn't just want to change the round numbers to be correct, because they would blend with all the other games. So I put an easy to identify prefix in to make the changes stick out like sore thumbs. Then, in my python session, I display all the game data. See everything is OK, and apply a trivial, error-proof last step(*) to strip off the prefix. Only then do I commit the updates. Because that's how I roll! (*) OK, there is no such thing as an error-proof step. But it's almost error-proof, if one is well practiced with python slices. . |
|
| Jun-13-15 | | zanzibar: Here's the tournaments that need Rounding:
Candidates Match: Aronian - Carlsen (2007)
Candidates Match: Leko - Gurevich (2007)
Candidates Match: Ponomariov - Rublevsky (2007)
Candidates Match: Gelfand - Kasimdzhanov (2007)
Candidates Match: Bacrot - Kamsky (2007)
Candidates Match: Grischuk - Malakhov (2007)
Candidates Match: Polgar - Bareev (2007)
Candidates Match: Shirov - Adams (2007)
Candidates Match: Aronian - Shirov (2007)
Candidates Match: Bareev - Leko (2007)
Candidates Match: Gelfand - Kamsky (2007)
Candidates Match: Grischuk - Rublevsky (2007)
Shirov - Kramnik WCC Candidates Match (1998) |
|
| Jun-13-15 | | zanzibar: <Check for errors, yes, but making changes, no thanks. If a compiler don't even bother to add dates and round numbers to a collection, it's probably not worth spending programming on it.> We covered this, but it's worth repeating.
In this case (for the matches), some biographer did dig out all the dates. The games are "well-dated". That's what allows the automatic updating of the round number. To emphasis - the tournaments weren't choosen at random, they were reviewed (by a human) to be eligible for updating by this technique. |
|
| Jun-13-15 | | zanzibar: BTW - can we get kibitzing on this forum to show up in the normal <Kibitzing> list of recent posts. This and maybe <chessgames> forum as well? I generally use <Kibitzing> as my springboard, and having to check the forums explicitly always seems like an extra step to me. |
|
Jun-13-15
 | | Annie K.: Heh... funnily enough, my "kibitzing home base" is actually the Recent Chessforum Activity, so this forum is much more on my radar than the Bistro. :) For exactly that reason, I have also suggested before that the Bistro should show up on the chessforums list as well - but now I'll modify that idea, and suggest that maybe these active public-interest forums (chessgames.com chessforum, Biographer Bistro, CG Librarian chessforum, and maybe the The Kibitzer's Café as well) could show up on *both* lists? |
|
| Jun-13-15 | | zanzibar: So in my quest to make the world safe for stubs I looked at all the move=0 games on <CG>. We have just a handful, 29. A handful of those in turn, are a little more interesting than the others: Pillsbury vs J Mason, 1895
I Bjelobrk vs T Kalisch, 2006
Dhulipalla Bala Chandra Prasad vs M Karthikeyan, 2014
So, these showed up as 0 move games for me because I trusted <PlyCount>, which is sometimes "-1" and "*" here on <CG>. Any ideas/comments on that? |
|
Jun-13-15
 | | chessgames.com: <Any ideas/comments on that?> Yes, I just checked the code and you uncovered a bug. The utility "fixpgn" will look for the regex /PlyCount "(\d+)"/
and then fixes it if the number is not correct.
Problem is, "*" and "-1" are not numbers by the above definition. |
|
Jun-13-15
 | | chessgames.com: Maybe the person who put -1 there was a Python programmer, and assumed that meant "the first move from the end." ;-) |
|
| Jun-13-15 | | zanzibar: <chessgames> ha!
My question is about <Candidates Match: Grischuk - Malakhov (2007)>[-3:] Candidates Match: Grischuk - Rublevsky (2007)
<DateRounding> will give two R7 games. There should have been two R6 games as well, except one of the games (@g1462369) has the very non-standard round <RR2 TB1> ([Round "R2 TB1"]). <ChessBase> gives the history fairly well, but one could reconstruct it almost as well for this match (since the match can't end on a draw). So I'll fix the rounds by hand to match other matches from the same series. All the other ones that went into tiebreak "mysteriously" started using "correct" round numbers, starting with R7. Strange that! Ah, but the question?
<Should we somehow indicate in the PGN the time controls?> In particular, fixing the Round Numbers loses the info that R7 and beyond are tiebreak rapids (followed by 5-min then Armageddon?). |
|
 |
 |
|
< Earlier Kibitzing · PAGE 13 OF 18 ·
Later Kibitzing> |
|
|
|