REBOL3 tracker
  0.9.12 beta
Ticket #0000428 User: anonymous

Project:



rss
TypeWish Statusreviewed Date5-Apr-2008 12:22
Versionalpha 31 CategoryNative Submitted byJerry
PlatformAll Severityminor Priorityhigh

Summary FIND and the Set Functions ( UNIQUE, UNION.. ) might need /compare refinement
Description just like SORT does.
Example code

			

Assigned ton/a Fixed in- Last Update20-Feb-2014 22:04


Comments
(0003512)
BrianH
26-Feb-2013 20:29

Updated the priority of this ticket based on some discussion in #1963 that, while it's unrelated to the minor problem covered by that ticket, made it clear that the /skip option of these functions makes it really important to have a /compare option, and to design the combined /skip/compare behavior from the ground up with the assumption that both options will be available on all record-oriented series functions that do comparisons. Even SORT should be changed to be consistent with our decision.

The relevant conflict, translated to UNIQUE for simplicity:

; Current behavior, R2-compatible except for #1963, expected by Sunanda
>> unique/skip [1 2 1 3 1 4 1 2] 2
== [1 2] ; only compare on first column

; Behavior expected by Ladislav and me
>> unique/skip [1 2 1 3 1 4 1 2] 2
== [1 2 1 3 1 4] ; compare on whole record

The questions this prompts: How would we change our model if we had /compare? Also, what should /compare do?
(0003513)
BrianH
26-Feb-2013 20:46

The series-record functions that do comparisons:
- difference
- exclude
- intersect
- sort
- union
- unique
- maximum-of and minimum-of (need renaming and rethinking, see #1818, #1971, #1972, #1988)

We also have finding functions that do comparisons (FIND and SELECT) but they don't really do comparisons in a record-oriented way so the combination of /skip and /compare would be different.
(0003514)
BrianH
26-Feb-2013 21:09

It appears that the critical issue here is how the /skip and /compare options interact, because it has become clear from the discussion started in #1963 that /compare is going to be necessary to resolve things. Given that the /skip option is used in these cases to treat series as fixed-length records, it seems that we can learn a bit from other database platforms.

Series with /skip seem to follow the table model - we have maps and objects for the key/value model. Any experienced DBA can tell you that in the table model, you shouldn't assume that the first column or any single column is the key column. Unless you have declared it otherwise, you have to assume that the whole set of columns is a composite key, which you can only hope is unique. Rebol series don't have a way in the series themselves to declare a key or even a record width (unlike SQL), so anything beyond the default needs to be declared and implemented outside the series, with the /compare and /skip options.

So, the question ends up being what is the sensible default, and how do you specify when it's otherwise?

R2's model of assuming that the first column was the key only made sense since it had no choice - no comprehensive /compare option - so it had to pick just one behavior. Any one choice would be bad in other cases, so it was just a matter of picking the best of a bad set of choices. SORT had to go with the same choice for consistency, even though it had /compare so it had better choices available.

If, on the other hand, you started from the ground up with /compare implemented throughout, you have better options.

If you go with a default of having the entire record compared (as expected by Ladislav and me), then having any particular column used as a key would be as simple as using /compare 1 or whatever is the key column number. Multi-column keys could be in a block of numbers. Special comparisons could be done as functions that are passed two references to the series at different positions. Pretty clean overall.

If you go with a default of comparing based on the first column (R2-compatible) then the case of comparing on a whole record ends up being awkward to specify. Do we go with /compare being passed a keyword? Do we require that it be passed a block with all of the column indexes? We would need to decide for R3 (R2 just managed by declaring the developer to be out-of-luck). The other cases would be the same as the other defualt.

I don't think that R2-compatibility is worth making the model awkward, but that's not my call alone. If it helps to consider, we can easily wrap default-compare-all functions in no-/compare-available functions for R3/Backward if we need to run R2 code in R3 (or we can continue to run it in R2).
(0003515)
Sunanda
26-Feb-2013 21:24

It's hard to know how widespread a REBOL coding practice is, but from a quick look at the 1100+ scripts on REBOL.org, it looks like the use of /skip with set operations is rare.

I could find only two scripts that used the idiom at all. Both uses UNION rather than UNIQUE, DIFFERENCE, or INTERSECT.

www.rebol.org/search.r?find=union/skip

So re-engineering for a different model may not affect many R2 legacy applications.
(0003518)
BrianH
26-Feb-2013 21:41

That's more telling than you might think, Sunanda. If it is difficult to find code that uses a feature, it's a hint that perhaps the feature wasn't useful enough. A similar search for the use of the return value of ALTER, not finding anything, led us to change the return value to something more useful in R3. Maybe that will be the case here as well.
(0003553)
BrianH
1-Mar-2013 22:29

Another example of the default behavior of these series-of-fixed-record functions tripping people up: #1978.
(0003556)
BrianH
2-Mar-2013 08:15

And here's another example: #726.
(0003594)
BrianH
7-Mar-2013 18:59

Ladislav added another ticket emphasizing the importance of this one for MAXIMUM-OF and MINIMUM-OF in #1988. Also see #1971 and #1972.
(0004255)
BrianH
20-Feb-2014 22:04

See #2110 for a direct implementation of these rules in a single-comparison control function.

Date User Field Action Change
20-Feb-2014 22:04 BrianH Comment : 0004255 Added -
7-Mar-2013 18:59 BrianH Comment : 0003594 Added -
7-Mar-2013 18:57 BrianH Comment : 0003513 Modified -
2-Mar-2013 08:15 BrianH Comment : 0003556 Added -
1-Mar-2013 22:29 BrianH Comment : 0003553 Added -
26-Feb-2013 21:41 BrianH Comment : 0003518 Added -
26-Feb-2013 21:24 sunanda Comment : 0003515 Added -
26-Feb-2013 21:09 BrianH Comment : 0003514 Added -
26-Feb-2013 20:46 BrianH Comment : 0003513 Added -
26-Feb-2013 20:29 BrianH Comment : 0003512 Added -
26-Feb-2013 20:16 BrianH Priority Modified low => high
26-Feb-2013 20:16 BrianH Severity Modified trivial => minor
26-Feb-2013 20:16 BrianH Category Modified => Native
23-Feb-2009 18:10 BrianH Severity Modified not a bug => trivial
20-Jan-2009 05:44 BrianH Priority Modified none => low
20-Jan-2009 05:44 BrianH Severity Modified trivial => not a bug
20-Jan-2009 05:44 BrianH Version Modified => alpha 31
2-Dec-2008 18:50 Admin Ticket Added -