frogbert: <As for the blitz marathon as a designed ploy>i described it as an "inutitive move", so i guess the amount of "design" that possibly could have gone into it must have been minimal. :o)
regarding the semantics - well, you're the native speaker. the most common translation of "superior" into norwegian does clearly indicate more distance than the similar translation of "a few notches", but something always gets lost in translation... in order to fine-tune my feeling for these small nuances i would probably have to stay abroad (in a natively english-speaking country) for at least a year.
the nerdy fix obviously is to stop using these "natural language terms" and instead define an exponential scale for comparing two players, where the domination of one to another is expressed by a number between 0 and 100 where
0 means player A and player B is equal
100 means player A is "infinitely"* better
50 means ...
etc.
*) would imply that A always beats B in head-to-head games, but probably also something about how A would do against their common opponents
of course, if some people start thinking about the rating system now, that wouldn't be strange - but also "wrong". :o) the above "suggestion" would be based on assessments made on purely subjective terms, but the assessment itself would be very exactly expressed (given a completed definition).
in the rating system, the calculations necessary to compare two players (or find their respective ratings) are very precisely defined, but the common interpretation (look up rating diff in table, calculate "expected score" for player A over player B) is generally nearly always wrong in the specific, head-to-head case (even if it more or less holds true statistically, on average). specifically, the rating system doesn't claim to say anything with a big certainty about <specific> player vs player strength, in terms of how a head-to-head match will/would turn out.
one's rating says something about one's accumulated historical performance against <all> previous opponents, and hence one should basically <expect> to see something different if one hypothetically had played a different selection of players. looking at a "meaningful"** subset of one's results, partitioned by some well-defined parameter (selected players, rating ranges, white games, black games, etc.) "simulates" this hypothetical scenario.
**) "meaningful" here should taken to mean "based on a big enough sample of games within a reasonably short time period" - there's a trade-off between size of sample and length of period; if the period becomes too long then variations are likely to be more affected by a real change in the player's strength compared to other parameters.
the above is also nicely illustrated and supported by the varying performances for players depending on opponent "strength" (i.e. rating), visible in my performance profiles, which i really, really should try to "complete", to the point that they could be published (well, it's more about making the functionality available than doing a publication - the system is dynamic, showing how the profile changes for players over time as new data is added).