REBOL3 tracker
  0.9.12 beta
Ticket #0002142 User: anonymous

Project:



Short URL: http://issue.cc/r3/2142
rss
TypeWish Statusreviewed Date6-Apr-2014 03:03
Versionr3 master CategoryNative Submitted byfork
PlatformAll Severityminor Prioritynormal

Summary Add COMBINE primitive as alternative to REJOIN (as opposed to changing REJOIN)
Description Frustrations with the "ugliness" of REJOIN's name and behavior led me to make CC2079 and CC2100 (to try and take other names or handle NONE differently). After the defense of JOIN being a binary operator which is useful and the disruption to legacy (a case I see now), I've dismissed both tickets in favor of a proposal to pursue this line of thinking with an entirely new primitive:

>> combine [{abc} if false {def} {ghi}]
== {abcghi} ;-- contrast this with REJOIN's {abcnoneghi}

>> combine [[a b c] if/only false [d e f] [g h i]]
== [a b c g h i] ;-- contrast with REJOIN's [a b c none [g h i]]

The general rule would be that combine would throw out nones. Merging unequal types would likely otherwise follow rejoin's logic. The type would match the first non-none. If all the elements were none (or no elements) the result would be a none.

>> combine []
== none

>> a: b: c: none
>> combine [a b c]
== none

There would be a /WITH refinement for content to be combined with an intermediary element, effectively placing an element in-between as if it had been there. Same logic would apply:

>> combine/with [{abc} none {def} {ghi}] {,}
== {abc,def,ghi}

>> combine/with [[a b c] none [d e f] [g h i]] [foo bar]
== [a b c foo bar d e f foo bar g h i]

The rules of nested blocks if producing a string series would be to run the combine continuously until things were flattened (as opposed to FORM-ing):

>> stuff: [{baz} {bar}]
>> combine [{foo} stuff]
== {foobazbar} ;-- as opposed to {foobaz bar} or {foo[{baz} {bar}]} or whatever

Getting such behavior on blocks may not be what is desired, so being able to suppress it with a /SHALLOW, or make it default to not doing it for block series and adding a /DEEP refinement would be better.

/ONLY might control whether things like paths would be collapsed in with blocks, if that were the default:

>> combine [a/b/c [d e f] (g h i)]
== a/b/c/d/e/f/g/h/i

>> combine/only [a/b/c [d e f] (g h i)]
== a/b/c/[d e f]/(g h i)

The idea of merging the series by default is in that area of things that are kind of ugly, but consistent.

An /ALL refinement or similar could suggest nones not be excluded, which would help people who were looking for COMBINE's behavior who didn't want to switch to REJOIN when they needed a none.

>> combine/all [[a b c] none [d e f]]
== [a b c none d e f]

---
Credit for the idea of making a new primitive instead of struggling with REJOIN from @earl. Name COMBINE from @Gregg. Maybe FLATTEN, which makes sense if the /DEEP behavior is the default for blocks. (Other suggestions of CONCAT, RECAT, or CATENATE don't seem as nice to me.) /WITH suggested by @rgchris.
Example code

			

Assigned ton/a Fixed in- Last Update28-Jun-2014 02:11


Comments
(0004415)
maxim
8-May-2014 20:38

I just saw this ticket... there are MANY good ideas here. I also really like the name.

you might want to build a prototype of this function for people to try out... I definitely will give it a trial if you build it.

We all have a few custom functions which look like this, but this proposal would probably be used very often by me, instead of the current REJOIN.

ignoring none with strings (I am a proponent of none transparency, in general) /WITH and /DEEP (on blocks) are all very useful features I'd use often.
(0004418)
fork
10-May-2014 10:52

@maxim glad you like it, I needed it for Rebmu in puzzle solving so that is where I put it for the moment, in "incubator.reb"

https://github.com/hostilefork/rebmu/blob/14dc62b1e1b72c5362e4b33155e77a2cfefeb447/incubator.reb#L24

It's unfortunate that we don't have an incubator somewhere in the repository proper. :-/

@rebolek offered the beginnings of the implementation, and built it on FLATTEN. So that raises another question: is that a separate and useful operation? The real work here--as people like to keep saying--is to sort out the precise semantics.
(0004423)
Gregg
12-May-2014 19:29

Definitely worth prototyping.
(0004425)
fork
13-May-2014 06:47

Working through some real-world scenarios with COMBINE, I am finding it very difficult to imagine a design and refinements which comfortably meet the needs of both string making and block making.

My own applications want substitutions that nest. It's a convenience that is a very nice property of the parse dialect, when you want to compose rules:

; a-rule: [some "a"]
; b-rule: [some "b"]
; combined-rule: [2 a-rule 3 b-rule]
; parse "aabbb" combined-rule

Outside of the NONE issue, a similar evaluation in COMBINE is one of the big things I find myself needing which REJOIN does not conveniently address. Contrived example:

>> inner: ["b" "c"] outer: ["a" inner "d"] combine [outer outer]
== "abcdabcd"

Using sensitivity to the first non-none data type may be something best left to REJOIN's turf, and the distinct needs of constructing non-strings might be better addressed by something like Ladislav's BUILD dialect:

http://www.fm.vslib.cz/~ladislav/rebol/build.r

If that is the case, then it lightens the load on COMBINE. It can know it's always producing a string, and avoid scenarios like:

; >> rejoin [
; <span> "one" </span>
; if/only 1 < 2 [<span> "1 < 2" </span>]
; if/only 1 > 2 [<span> "1 > 2" </span>]
; ]
; == <spanone</span><span> 1 < 2 </span>none>

Rather than using type-detection to make the return type, perhaps an /INTO refinement could allow insertion of the generated data directly into a string type of your choice.

; >> t: make tag! 10
; >> combine/into [{a href=} url] t

This would narrow the definition of COMBINE to being a dialect that "combines Rebol values to form a string". The recursive nature for variable substitution and special handling of blocks crosses the line into making it a dialect. Certainly it makes a lot of sense to me not to FORM component blocks to produce brackets into the output unless that is explicitly requested. Especially if FORMing has different rules for spaces than the rest of the material being combined.

Currently my feeling is that any evaluation that bottoms out with a word value should be treated as an error condition. For instance:

; >> combine ["foo" 'baz "bar"]
;
; >> combine ["foo" quote baz: 10 "bar"]

It would be technically possible to continue the evaluation, but it seems safer to require that be done by evaluation returning a block. It's not a direct parallel to PARSE, but note that PARSE only evaluates one level deep:

; >> foo: "a"
; == "a"
;
; >> bar: quote 'foo
; == 'foo
;
; >> parse ["a" "a" "a"] [some bar]
; == false
;
; >> parse [foo foo foo] [some bar]
; == true

Hence I'd suggest COMBINE taking a similar approach, descending into block substitutions but throwing errors on attempts to re-evaluate a word.

This definition starts to get in the way of /WITH as to whether it puts the interleaving at the inner levels:

; >> combine/with [[{foo} {baz}] [{bar} {mumble}] {,}

In the situations I'm seeing I want the control to "zone off" the /WITH using block structures and then if I really want it on an inner level, I'll ask for it explicitly. (Since COMBINE [x y] is equivalent to COMBINE [COMBINE x COMBINE y]). If COMBINE is headed down the dialect path, there may be a word to set aside for the purpose... but there are many ways to approach the problem. (COMPOSE before you COMBINE, etc).

Complexities of trying it out aside: it is certainly something that is coming in very useful and helping clean up a lot. I just think the needs are too different from block building; and the desire to put NONE into blocks seems to have been only the first of many contentions. So if no one disagrees, I'm removing block building from the proposal.
(0004426)
rebolek
13-May-2014 08:21

@fork: Have a look at my [latest implementation](https://github.com/rebolek/dvorek/blob/master/combine.reb) of COMBINE to see if it fits your needs. It has /only refinement added and the implementation is simpler.
(0004427)
fork
13-May-2014 14:58

@rebolek: Regarding updated thinking, see blog post at http://blog.hostilefork.com/combine-alternative-rebol-red-rejoin/

I've tried to sort of put some order and better formatting on the issues. My implementation is probably bad so people shouldn't look at it. But using your implementation I get a result that is very different from what I'm looking for:

Script: "Combine" Version: 0.0.3 Date: 13-May-2014
>> a: function [] [return [b c]]
>> b: "foo"
== "foo"
>> c: ["baz" "bar"]
== ["baz" "bar"]
>> combine [c [a if 1 > 2 ["uh-oh"] a]]
== "bazbaraif1>2uh-oha"

In my thinking, this returns "bazbarfoobazbarfoobazbar"
(0004449)
fork
17-Jun-2014 09:33

See this CodeReview.SE post for a request for comment on the draft implementation:

http://codereview.stackexchange.com/questions/54466/
(0004465)
BrianH
22-Jun-2014 05:56

Idea looks good, name looks good. See #724 for an example of an /as type option, though in this case it would be specifying the container type for the output, not the element type.

I'm not sure you want to limit this to string types. There was a value in having REWORD only be for strings and binaries, but the operating model of COMBINE seems like it would work for blocks and parens as well. The none-skipping in particular seems like it would be useful for building path types, for instance, and not just strings need delimiters (think PARSE rules with alternation).

That WHILE DO/next idiom seems like it might be worth supporting directly as a loop function. I'll give it some thought.
(0004466)
BrianH
22-Jun-2014 07:09

I apparently don't have a CodeReview account, so I'll put my criticism here.

First of all, you didn't quite get /into right. You should have used INSERT instead of APPEND - that's the whole point. Also, rgchris was wrong about allowing none for the out parameter. The output parameter is meant to be modified. We never do none propagation for parameters that are supposed to be modified, that's too much of a clear error, we definitely want those flagged. But otherwise your /into is OK.

If you decide to do an /as type option, you should trigger an incompatible refinements error if it's combined with /into. Even though you could make /as ignored in that case, the error is more valuable since both options change the behavior, so arbitrarily choosing either behavior would surprise people expecting the other. You might even consider requiring one or the other, if you think string! isn't a sensible default for the /as type.

Having /with block just concatenate the contents of the block into a single delimiter is missing an opportunity. If someone needs to combine the block into a single delimiter, it is easy for them to do it themselves. It's better to reserve the block for nested delimiters, such as the record and field delimiters of CSV files.

You should consider whether you're going to do a full reduce before value screening, or evaluate one at a time and screen them as you go. It's a matter of whether the side effects of the evaluations should all finish before the error is triggered, or whether only the side effects from before the erroneous value is seen. Note that the all-evaluations-first model is easier to do quickly in mezzanine (as it was in rebolek's code), but has a lot of overhead that is really awkward to replicate in native code (creating the intermediate blocks, for instance).

If this is done dialected, which I don't yet recommend btw (still need to think about that), then you might consider having some kind of position variable for use with nested calls to functions that follow the /into model. This would save you the trouble of detecting those functions at runtime and providing the /into option automatically, and can be used with other functions that use INSERT or other incremental builders that take an output target using a different calling convention. This might be a pie-in-the-sky feature though.

COMBINE is definitely too general a function name to limit to any-string output. You definitely need to think through a behavioral model for outputting any-block types.

There's some code in there that wouldn't be that efficient, even natively. Needs an optimization pass.

I'm not sold on a dialected treatment here. Value type disposition seems more useful for this. But I'm still thinking it through.
(0004474)
fork
24-Jun-2014 01:35

> If someone needs to combine the block into a single delimiter, it is easy for them to do it themselves. It's better to reserve the block for nested delimiters

It's an interesting concept...although I feel like the CSV usage is probably the only time it would come up. There are a lot of issues that are CSV-specific which are better addressed with a CSV library. As another strike to that, I'd say that CSV is such a poor format compared to "Ren" that one doesn't really want to build in or tailor anything in the system to it.

Purpose here is I'm trying to really raise the readability bar, such as with:

print combine/with ["a" "b" "c"] ["," space]

...vs:

print combine/with ["a" "b" "c"] ", "

And I didn't much care for the repetition of:

print combine/with ["a" "b" "c"] combine ["," space]

One thing that does need to be considered is the implication of doing the combine once vs if it is run and reduced at each step through the combine.

> You should consider whether you're going to do a full reduce before value screening, or evaluate one at a time and screen them as you go.

I feel like the one-at-a-time and going deep into earlier blocks before doing the later blocks in the topmost layer makes the most sense.

> COMBINE is definitely too general a function name to limit to any-string output.
> You definitely need to think through a behavioral model for outputting any-block types.

I got to feeling like Ladislav's BUILD (for instance) is a better direction for giving a good treatment if the result is to be a block; there seem to be too many nuances as to what you want to do with the blocks-inside-blocks. And before an /AS option there wasn't a way to signify a block was wanted, since I wanted to ditch rejoin's dependence on the first element type.

Guess the thing it could sensibly do is take the error cases for string and just let those things be in the block if they get returned. Then recursive evaluations are only done on blocks:

>> foo: "a"
>> bar: [foo 1020 'foo]
>> combine/as [bar if 1 < 2 [quote 'foo] quote bar] block!
== ["a" 1020 foo 'foo bar]

Existence of a block case suggests /as and /into aren't intrinsically incompatible; just if your target is a string or binary series and you don't pick string/binary. The question of what to do with PAREN! and the PATH! types would come up.

Not sure if it's a solution looking for a problem...though never having had such a thing available, I don't know. Like the string case it's a mash-up between reducing and flattening. A key test would be if what you're trying to construct doesn't have any nested blocks...if not, it might be an interesting tool, considering that I've found the string version to be quite useful.

I'll look over some code I have and see if I can spot interesting use cases for a block COMBINE.
(0004476)
fork
24-Jun-2014 21:23

Discussion in chat yielded some interesting ideas. I think the hierarchical case could be supported in a way that provides arbitrary flexibility by allowing a function parameter that takes the depth:

>> combine [{a} [{b} [{c} [{d} {e}] {f}] {g}] {h}] function [depth] [pick {,:?} depth]
== {a,b:c?de?f:g,h}

Since a none is COMBINE'd by disappearing, that means anything deeper than 3 levels won't get delimiters. If you wanted everything 3 and higher to have a question mark:

>> combine [{a} [{b} [{c} [{d} {e}] {f}] {g}] {h}] function [depth] [either (c: pick {,:} depth) [c] {?}]
== {a,b:c?d?e?f:g,h}

Whatever that function returns would obey the rules for combine. So if you give back a block, it will be COMBINE'd.

>> x: 0
>> combine [{a} {b} {c}] func [depth] [x: x + 1 return [x "," space]]
== {a1, b2, c}

I'll maintain this is probably going to be an uncommon case; and certainly people who would need it probably have a variety of ideas about the behavior--a function to do whatever you want seems like a good fit.

The default I'm advocating for passing a block argument is very specific, however.

combine [...] blk <=> combine [...] function [depth] [if/only (1 = depth) blk]

The majority of COMBINE usages--at least at first--are going to come from cases that one would have used REJOIN for previously. The behavior of blocks embedded in REJOIN was not of much use. So these cases will not be starting out in general with nested blocks. By not assuming you want the delimiters applied anywhere besides the first level by default, it means you can throw in brackets to suppress the delimiter. That ability on the very common "one-level" case to throw in a second level for the sole purpose of grouping is quite useful.
(0004477)
rgchris
25-Jun-2014 00:43

@BrianH--it's fairly straightforward creating a Code Review account if you're already on StackOverflow. I'm not wrong, just perhaps at odds with a particular guideline for submissions to the Rebol codebase that I'm unfamiliar with anyhow. I can appreciate why you might not want to take that approach, but that doesn't mean there aren't reasons for it.
(0004481)
BrianH
28-Jun-2014 02:11

No worries about the Code Review thing, it was just a matter of not wanting to lose my train of thought.

Fork, that function suggestion sounds like a good approach, with the right arguments required. You should look at the behavior of REPLACE with a function replacement value. In particular, it should have an argument of the series at the position in question, as well as the depth. This will make more complex combinations possible. ARRAY and REWORD also do special things with function arguments, so we might look to them for inspiration as well.

Date User Field Action Change
28-Jun-2014 02:11 BrianH Comment : 0004481 Added -
25-Jun-2014 00:43 rgchris Comment : 0004477 Added -
24-Jun-2014 21:23 Fork Comment : 0004476 Added -
24-Jun-2014 01:35 Fork Comment : 0004474 Added -
22-Jun-2014 07:14 BrianH Comment : 0004465 Modified -
22-Jun-2014 07:13 BrianH Comment : 0004466 Modified -
22-Jun-2014 07:09 BrianH Comment : 0004466 Added -
22-Jun-2014 06:14 BrianH Comment : 0004465 Modified -
22-Jun-2014 05:56 BrianH Status Modified submitted => reviewed
22-Jun-2014 05:56 BrianH Comment : 0004465 Added -
17-Jun-2014 09:33 Fork Comment : 0004449 Added -
14-May-2014 15:48 fork Comment : 0004427 Modified -
13-May-2014 14:58 fork Comment : 0004427 Added -
13-May-2014 08:29 Fork Comment : 0004425 Modified -
13-May-2014 08:21 rebolek Comment : 0004426 Added -
13-May-2014 06:47 Fork Comment : 0004425 Added -
12-May-2014 19:29 Gregg Comment : 0004423 Added -
10-May-2014 10:52 Fork Comment : 0004418 Added -
8-May-2014 20:38 maxim Comment : 0004415 Added -
6-Apr-2014 03:10 Fork Description Modified -
6-Apr-2014 03:03 Fork Ticket Added -