by denis_berthier » Wed Aug 27, 2008 1:18 am
We can observe that no universal rating has yet been devised and I think this will remain the case for a long time.
I think we should ask a few basic questions. I'll leave most of them open.
1) What general purpose do we assign to a rating?
Should it be able to rate all the puzzles? Why?
Should it be able to rate all the puzzles that can currently be solved by a "normal" human player?
Or do we want only a rating for the most common puzzles, from easy to diabolical - for commercial purposes?
On the contrary, do we want a specific rating for exceptionally hard puzzles - for research purposes?
(In this open list, where do we put SER? Commercial? I mean: its declared validity is universal, but isn't its real validity limited to commercial puzzles?)
2) Shall we be happy if the rating satisfies its general purpose or, in addition, should it be based on well defined general principles?
This isn't only rethorical. A rating may be useful even if it isn't based on clearly defined principles but on experience with resolution. Isn't this the case for SER? (I'm really asking; I really don't know.)
3) Inthe second case, what constraints do we impose on the rating? E.g.:
- do we want the rating to be invariant under permutations of rows, columns, floors and towers and under row/column symmetry (SER doesn't even satisfy this very minimal requirement);
- do we want it to be invariant under supersymmetry (so that e.g. fish=naked set of the same size);
- on the contrary, do we want it to be closer to human perception (which would make it a very hard research topic in our current state of knowledge)?
4) Should the rating be based on a well defined backbone?
I mean: as any rating has to be based on a particular set of rules, can we choose a basic set of rules (e.g. some fixed type of chains of increasing length) that will serve as a backbone and define a scale for the rating, all the other rules being then integrated within this hierarchy.
That's how I've proceeded for the NRCZT-rating and that's how Allan could provide another rating with broader scope if he chose some ordering of his second order patterns.
Notice that we have no result on a potential rating based on AICs with ALSs. Such AICs could have been the backbone of SER but, instead, they are given arbitrary values and are arbitrarily mixed with lots of other unrelated patterns.
5) Given two ratings as above, with different backbones, how do they compare?
We've seen that, for non exceptionally hard puzzles, the most common ratings are reasonably well (but not strongly) correlated.
6) How do we assess the validity of a rating?
As there is no universal rating, this is impossible in an absolute sense. One may compare a rating with one's intuitions, but the result is very likely to be different for different persons.
However, given several formal ratings based on different approaches, if they are strongly correlated, it is likely that they all capture some important aspects of the puzzles. This is some statistical cross validity check.
Of course, statistical cross validity itself has limited validity. But the clearer the basic principles of each rating the better we can analyse the discrepancies between the ratings on particular puzzles.
7) When a new resolution rule appears (e.g. Steve's), how can it be taken into account in the existing ratings?
It can easily be observed that the addition of a really new resolution rule radically modifies any rating - in the sense that it reverses the relative places of many puzzles.
But a single resolution rule with no parameter can hardly be the basis for a new rating if it is not itself included in a broader parameterised set of rules. As of now, what such a set should be for Steve's pattern remains unclear as it is only inherent to the exceptionally symmetric EM family.
Last edited by
denis_berthier on Tue Aug 26, 2008 10:20 pm, edited 1 time in total.