Type | Wish | Status | reviewed | Date | 5-Apr-2008 12:22 |
---|---|---|---|---|---|
Version | alpha 31 | Category | Native | Submitted by | Jerry |
Platform | All | Severity | minor | Priority | high |
Summary | FIND and the Set Functions ( UNIQUE, UNION.. ) might need /compare refinement |
---|---|
Description | just like SORT does. |
Example code |
Assigned to | n/a | Fixed in | - | Last Update | 20-Feb-2014 22:04 |
---|
Comments | |
---|---|
(0003512)
BrianH 26-Feb-2013 20:29 |
Updated the priority of this ticket based on some discussion in #1963 that, while it's unrelated to the minor problem covered by that ticket, made it clear that the /skip option of these functions makes it really important to have a /compare option, and to design the combined /skip/compare behavior from the ground up with the assumption that both options will be available on all record-oriented series functions that do comparisons. Even SORT should be changed to be consistent with our decision.
The relevant conflict, translated to UNIQUE for simplicity: ; Current behavior, R2-compatible except for #1963, expected by Sunanda >> unique/skip [1 2 1 3 1 4 1 2] 2 == [1 2] ; only compare on first column ; Behavior expected by Ladislav and me >> unique/skip [1 2 1 3 1 4 1 2] 2 == [1 2 1 3 1 4] ; compare on whole record The questions this prompts: How would we change our model if we had /compare? Also, what should /compare do? |
(0003513)
BrianH 26-Feb-2013 20:46 |
The series-record functions that do comparisons:
- difference - exclude - intersect - sort - union - unique - maximum-of and minimum-of (need renaming and rethinking, see #1818, #1971, #1972, #1988) We also have finding functions that do comparisons (FIND and SELECT) but they don't really do comparisons in a record-oriented way so the combination of /skip and /compare would be different. |
(0003514)
BrianH 26-Feb-2013 21:09 |
It appears that the critical issue here is how the /skip and /compare options interact, because it has become clear from the discussion started in #1963 that /compare is going to be necessary to resolve things. Given that the /skip option is used in these cases to treat series as fixed-length records, it seems that we can learn a bit from other database platforms.
Series with /skip seem to follow the table model - we have maps and objects for the key/value model. Any experienced DBA can tell you that in the table model, you shouldn't assume that the first column or any single column is the key column. Unless you have declared it otherwise, you have to assume that the whole set of columns is a composite key, which you can only hope is unique. Rebol series don't have a way in the series themselves to declare a key or even a record width (unlike SQL), so anything beyond the default needs to be declared and implemented outside the series, with the /compare and /skip options. So, the question ends up being what is the sensible default, and how do you specify when it's otherwise? R2's model of assuming that the first column was the key only made sense since it had no choice - no comprehensive /compare option - so it had to pick just one behavior. Any one choice would be bad in other cases, so it was just a matter of picking the best of a bad set of choices. SORT had to go with the same choice for consistency, even though it had /compare so it had better choices available. If, on the other hand, you started from the ground up with /compare implemented throughout, you have better options. If you go with a default of having the entire record compared (as expected by Ladislav and me), then having any particular column used as a key would be as simple as using /compare 1 or whatever is the key column number. Multi-column keys could be in a block of numbers. Special comparisons could be done as functions that are passed two references to the series at different positions. Pretty clean overall. If you go with a default of comparing based on the first column (R2-compatible) then the case of comparing on a whole record ends up being awkward to specify. Do we go with /compare being passed a keyword? Do we require that it be passed a block with all of the column indexes? We would need to decide for R3 (R2 just managed by declaring the developer to be out-of-luck). The other cases would be the same as the other defualt. I don't think that R2-compatibility is worth making the model awkward, but that's not my call alone. If it helps to consider, we can easily wrap default-compare-all functions in no-/compare-available functions for R3/Backward if we need to run R2 code in R3 (or we can continue to run it in R2). |
(0003515)
Sunanda 26-Feb-2013 21:24 |
It's hard to know how widespread a REBOL coding practice is, but from a quick look at the 1100+ scripts on REBOL.org, it looks like the use of /skip with set operations is rare.
I could find only two scripts that used the idiom at all. Both uses UNION rather than UNIQUE, DIFFERENCE, or INTERSECT. www.rebol.org/search.r?find=union/skip So re-engineering for a different model may not affect many R2 legacy applications. |
(0003518)
BrianH 26-Feb-2013 21:41 |
That's more telling than you might think, Sunanda. If it is difficult to find code that uses a feature, it's a hint that perhaps the feature wasn't useful enough. A similar search for the use of the return value of ALTER, not finding anything, led us to change the return value to something more useful in R3. Maybe that will be the case here as well. |
(0003553)
BrianH 1-Mar-2013 22:29 |
Another example of the default behavior of these series-of-fixed-record functions tripping people up: #1978. |
(0003556)
BrianH 2-Mar-2013 08:15 |
And here's another example: #726. |
(0003594)
BrianH 7-Mar-2013 18:59 |
Ladislav added another ticket emphasizing the importance of this one for MAXIMUM-OF and MINIMUM-OF in #1988. Also see #1971 and #1972. |
(0004255)
BrianH 20-Feb-2014 22:04 |
See #2110 for a direct implementation of these rules in a single-comparison control function. |
Date | User | Field | Action | Change |
---|---|---|---|---|
20-Feb-2014 22:04 | BrianH | Comment : 0004255 | Added | - |
7-Mar-2013 18:59 | BrianH | Comment : 0003594 | Added | - |
7-Mar-2013 18:57 | BrianH | Comment : 0003513 | Modified | - |
2-Mar-2013 08:15 | BrianH | Comment : 0003556 | Added | - |
1-Mar-2013 22:29 | BrianH | Comment : 0003553 | Added | - |
26-Feb-2013 21:41 | BrianH | Comment : 0003518 | Added | - |
26-Feb-2013 21:24 | sunanda | Comment : 0003515 | Added | - |
26-Feb-2013 21:09 | BrianH | Comment : 0003514 | Added | - |
26-Feb-2013 20:46 | BrianH | Comment : 0003513 | Added | - |
26-Feb-2013 20:29 | BrianH | Comment : 0003512 | Added | - |
26-Feb-2013 20:16 | BrianH | Priority | Modified | low => high |
26-Feb-2013 20:16 | BrianH | Severity | Modified | trivial => minor |
26-Feb-2013 20:16 | BrianH | Category | Modified | => Native |
23-Feb-2009 18:10 | BrianH | Severity | Modified | not a bug => trivial |
20-Jan-2009 05:44 | BrianH | Priority | Modified | none => low |
20-Jan-2009 05:44 | BrianH | Severity | Modified | trivial => not a bug |
20-Jan-2009 05:44 | BrianH | Version | Modified | => alpha 31 |
2-Dec-2008 18:50 | Admin | Ticket | Added | - |