REBOL3 tracker
  0.9.12 beta
Ticket #0001886 User: anonymous

Project:



Short URL: http://issue.cc/r3/1886
rss
TypeBug Statusbuilt Date8-Jul-2011 23:17
Versionalpha 111 CategoryMezzanine Submitted byGHigley
PlatformAll Severityminor Prioritylow

Summary SPLIT adds empty string to end of returned block
Description SPLIT returns an empty string at the end of the returned block when using a delimiter. Although this may be by design, I'm skeptical. (I have not tested this on any platform but OS X.)
Example code
foo: "a.b.c"
split foo "."
; returns ["a" "b" "c" ""]

foo: "a.b.c."
split foo "."
; returns ["a" "b" "c" ""], same as above

Assigned ton/a Fixed inr3 master Last Update19-Feb-2014 21:27


Comments
(0003190)
Gregg
19-Jul-2011 20:44

The following implementation is a bit ugly, with the special handling added, but it does address all the SPLIT tickets I've found here.
(0003191)
Gregg
19-Jul-2011 20:44

split: func [
    "Split a series into pieces; fixed or variable size, fixed number, or at delimiters"
    series    [series!] "The series to split"
    dlm        [block! integer! char! bitset! any-string!] "Split size, delimiter(s), or rule(s)." 
    /into    "If dlm is an integer, split into n pieces, rather than pieces of length n."
    /local size piece-size count mk1 mk2 res fill-val add-fill-val
][
    either all [block? dlm  parse dlm [some integer!]] [
        map-each len dlm [
            either positive? len [
                copy/part series series: skip series len
            ] [
                series: skip series negate len
                ; return unset so that nothing is added to output
                ()
            ]
        ]
    ][
        size: dlm   ; alias for readability
        res: collect [
            parse/all series case [
                all [integer? size  into] [
                    if size < 1 [cause-error 'Script 'invalid-arg size]
                    count: size - 1
                    piece-size: to integer! round/down divide length? series size
                    if zero? piece-size [piece-size: 1]
                    [
                        count [copy series piece-size skip (keep/only series)]
                        copy series to end (keep/only series)
                    ]
                ]
                integer? dlm [
                    if size < 1 [cause-error 'Script 'invalid-arg size]
                    [any [copy series 1 size skip (keep/only series)]]
                ]
                'else [ ; = any [bitset? dlm  any-string? dlm  char? dlm]
                    [any [mk1: some [mk2: dlm break | skip] (keep/only copy/part mk1 mk2)]]
                ]
            ]
        ]
        ;-- Special processing, to handle cases where the spec'd more items in
        ;   /into than the series contains (so we want to append empty items),
        ;   or where the dlm was a char/string/charset and it was the last char
        ;   (so we want to append an empty field that the above rule misses).
        fill-val: does [copy either any-block? series [[]] [""]]
        add-fill-val: does [append/only res fill-val]
        case [
            all [integer? size  into] [
                ; If the result is too short, i.e., less items than 'size, add
                ; empty items to fill it to 'size.
                ; We loop here, because insert/dup doesn't copy the value inserted.
                if size > length? res [
                    loop (size - length? res) [add-fill-val]
                ]
            ]
            ; integer? dlm [
            ; ]
            'else [ ; = any [bitset? dlm  any-string? dlm  char? dlm]
                ; If the last thing in the series is a delimiter, there is an
                ; implied empty field after it, which we add here.
                case [
                    bitset? dlm [
                        ; ATTEMPT is here because LAST will return NONE for an 
                        ; empty series, and finding none in a bitest is not allowed.
                        if attempt [find dlm last series] [add-fill-val]
                    ]
                    char? dlm [
                        if dlm = last series [add-fill-val]
                    ]
                    string? dlm [
                        if all [
                            find series dlm
                            empty? find/last/tail series dlm
                        ] [add-fill-val]
                    ]
                ]
            ]
        ]
                
        res
    ]
]
(0003192)
Gregg
19-Jul-2011 20:46

A quick test func:

test: func [block expected-result /local res] [
    if error? try [
        print [mold/only :block newline tab mold res: do block]
        if res <> expected-result [print [tab 'FAILED! tab 'expected mold expected-result]]
    ][
        print [mold/only :block newline tab "ERROR!"]
    ]
]
(0003193)
Gregg
19-Jul-2011 20:47

And a few tests:

test [split "1234567812345678" 4]  ["1234" "5678" "1234" "5678"]

test [split "1234567812345678" 3]  ["123" "456" "781" "234" "567" "8"]
test [split "1234567812345678" 5]  ["12345" "67812" "34567" "8"]

test [split/into [1 2 3 4 5 6] 2]       [[1 2 3] [4 5 6]]
test [split/into "1234567812345678" 2]  ["12345678" "12345678"]
test [split/into "1234567812345678" 3]  ["12345" "67812" "345678"]
test [split/into "1234567812345678" 5]  ["123" "456" "781" "234" "5678"]

; Dlm longer than series
test [split/into "123" 6]       ["1" "2" "3" "" "" ""] ;or ["1" "2" "3"]
test [split/into [1 2 3] 6]     [[1] [2] [3] [] [] []] ;or [1 2 3]

test [split [1 2 3 4 5 6] [2 1 3]]                  [[1 2] [3] [4 5 6]]
test [split "1234567812345678" [4 4 2 2 1 1 1 1]]   ["1234" "5678" "12" "34" "5" "6" "7" "8"]
test [split first [(1 2 3 4 5 6 7 8 9)] 3]          [(1 2 3) (4 5 6) (7 8 9)]
test [split #{0102030405060708090A} [4 3 1 2]]      [#{01020304} #{050607} #{08} #{090A}]

test [split [1 2 3 4 5 6] [2 1]]                [[1 2] [3]]

test [split [1 2 3 4 5 6] [2 1 3 5]]            [[1 2] [3] [4 5 6] []]

test [split [1 2 3 4 5 6] [2 1 6]]              [[1 2] [3] [4 5 6]]

; Old design for negative skip vals
;test [split [1 2 3 4 5 6] [3 2 2 -2 2 -4 3]]    [[1 2 3] [4 5] [6] [5 6] [3 4 5]]
; New design for negative skip vals
test [split [1 2 3 4 5 6] [2 -2 2]]             [[1 2] [5 6]]

test [split "abc,de,fghi,jk" #","]              ["abc" "de" "fghi" "jk"]
test [split "abc
de
fghi
jk"
] ["abc" "de" "fghi" "jk"] test [split "a.b.c" "."] ["a" "b" "c"] test [split "c c" " "] ["c" "c"] test [split "1,2,3" " "] ["1,2,3"] test [split "1,2,3" ","] ["1" "2" "3"] test [split "1,2,3," ","] ["1" "2" "3" ""] test [split "1,2,3," charset ",."] ["1" "2" "3" ""] test [split "1.2,3." charset ",."] ["1" "2" "3" ""] test [split "-a-a" ["a"]] ["-" "-"] test [split "-a-a'" ["a"]] ["-" "-" "'"] test [split "abc|de/fghi:jk" charset "|/:"] ["abc" "de" "fghi" "jk"] test [split "abc^M^Jde^Mfghi^Jjk" [crlf | #"^M" | newline]] ["abc" "de" "fghi" "jk"] test [split "abc de fghi jk" [some #" "]] ["abc" "de" "fghi" "jk"]
(0003429)
abolka
6-Feb-2013 02:41

Gregg, that looks fantastic.

Seems one of the tests got messed up/misformatted by CureCode, but I can confirm that the rest of the tests pass fine with R3 A111.
(0003934)
abolka
18-Aug-2013 10:48

I'd like to see this merged. I think there was a desire (expressed by BrianH, for example, IIRC) to have the /INTO refinement of above proposed function renamed.

If I remember this correctly, are there any suggestions as to what rename /INTO to?
(0003938)
johnk
19-Aug-2013 07:47

Merged into mainline https://github.com/rebol/rebol/pull/130
Keep the ticket open to discuss renaming /INTO
(0003940)
onetom
19-Aug-2013 11:24

Why not break it out into it's own ticket with a back reference to this one?
It's easier to search for it and the discussion can be more focused too.
(Awesome work, Gregg. A gem! and it's burried here for so long...)

Btw, are we not moving to github issues?
(0003941)
onetom
19-Aug-2013 11:43

Like this: https://github.com/rebol/rebol/issues/131 ?
(0003942)
abolka
19-Aug-2013 12:06

A separate ticket sounds good.

But please keep the discussion here on CureCode for now, until we properly migrate this whole CureCode database to GitHub issues.
(0003943)
abolka
19-Aug-2013 14:45

Created ticket #2051 to discuss the renaming of /INTO.
(0003946)
abolka
21-Aug-2013 00:24

In the core tests suite. (Added Gregg's tests from above comment.)

Date User Field Action Change
19-Feb-2014 21:27 BrianH Code Modified -
19-Feb-2014 21:27 BrianH Fixedin Modified => r3 master
19-Feb-2014 21:27 BrianH Status Modified submitted => built
19-Feb-2014 21:27 BrianH Platform Modified Mac OSX => All
21-Aug-2013 00:24 abolka Comment : 0003946 Added -
19-Aug-2013 14:45 abolka Comment : 0003943 Added -
19-Aug-2013 12:06 abolka Comment : 0003942 Added -
19-Aug-2013 11:43 onetom Comment : 0003941 Added -
19-Aug-2013 11:25 onetom Comment : 0003940 Modified -
19-Aug-2013 11:24 onetom Comment : 0003940 Added -
19-Aug-2013 07:47 johnk Comment : 0003938 Added -
19-Aug-2013 02:38 johnk Comment : 0003937 Removed -
19-Aug-2013 01:53 johnk Comment : 0003937 Modified -
19-Aug-2013 01:53 johnk Comment : 0003937 Modified -
19-Aug-2013 01:52 johnk Comment : 0003937 Modified -
19-Aug-2013 01:52 abolka Comment : 0003934 Modified -
19-Aug-2013 01:52 abolka Comment : 0003934 Modified -
19-Aug-2013 01:44 johnk Comment : 0003937 Added -
18-Aug-2013 10:48 abolka Comment : 0003934 Added -
6-Feb-2013 02:41 abolka Comment : 0003429 Added -
19-Jul-2011 20:47 Gregg Comment : 0003193 Added -
19-Jul-2011 20:46 Gregg Comment : 0003192 Added -
19-Jul-2011 20:45 Gregg Comment : 0003191 Modified -
19-Jul-2011 20:45 Gregg Comment : 0003191 Modified -
19-Jul-2011 20:44 Gregg Comment : 0003191 Modified -
19-Jul-2011 20:44 Gregg Comment : 0003191 Added -
19-Jul-2011 20:44 Gregg Comment : 0003190 Added -
8-Jul-2011 23:17 GHigley Ticket Added -