REBOL3 tracker
  0.9.12 beta
Ticket #0001886 User: anonymous

Project:



Short URL: http://issue.cc/r3/1886
rss
TypeBug Statussubmitted Date8-Jul-2011 23:17
Versionalpha 111 CategoryMezzanine Submitted byGHigley
PlatformMac OSX Severityminor Prioritylow

Summary SPLIT adds empty string to end of returned block
Description SPLIT returns an empty string at the end of the returned block when using a delimiter. Although this may be by design, I'm skeptical. (I have not tested this on any platform but OS X.)
Example code
foo: "a.b.c"
split foo "."
; returns ["a" "b" "c" ""]

foo: "a.b.c."
split foo "."
; returns ["a" "b" "c" ""], same as above

Assigned ton/a Fixed in- Last Update6-Feb-2013 02:41


Comments
(0003190)
Gregg
19-Jul-2011 20:44

The following implementation is a bit ugly, with the special handling added, but it does address all the SPLIT tickets I've found here.
(0003191)
Gregg
19-Jul-2011 20:44

split: func [
    "Split a series into pieces; fixed or variable size, fixed number, or at delimiters"
    series    [series!] "The series to split"
    dlm        [block! integer! char! bitset! any-string!] "Split size, delimiter(s), or rule(s)." 
    /into    "If dlm is an integer, split into n pieces, rather than pieces of length n."
    /local size piece-size count mk1 mk2 res fill-val add-fill-val
][
    either all [block? dlm  parse dlm [some integer!]] [
        map-each len dlm [
            either positive? len [
                copy/part series series: skip series len
            ] [
                series: skip series negate len
                ; return unset so that nothing is added to output
                ()
            ]
        ]
    ][
        size: dlm   ; alias for readability
        res: collect [
            parse/all series case [
                all [integer? size  into] [
                    if size < 1 [cause-error 'Script 'invalid-arg size]
                    count: size - 1
                    piece-size: to integer! round/down divide length? series size
                    if zero? piece-size [piece-size: 1]
                    [
                        count [copy series piece-size skip (keep/only series)]
                        copy series to end (keep/only series)
                    ]
                ]
                integer? dlm [
                    if size < 1 [cause-error 'Script 'invalid-arg size]
                    [any [copy series 1 size skip (keep/only series)]]
                ]
                'else [ ; = any [bitset? dlm  any-string? dlm  char? dlm]
                    [any [mk1: some [mk2: dlm break | skip] (keep/only copy/part mk1 mk2)]]
                ]
            ]
        ]
        ;-- Special processing, to handle cases where the spec'd more items in
        ;   /into than the series contains (so we want to append empty items),
        ;   or where the dlm was a char/string/charset and it was the last char
        ;   (so we want to append an empty field that the above rule misses).
        fill-val: does [copy either any-block? series [[]] [""]]
        add-fill-val: does [append/only res fill-val]
        case [
            all [integer? size  into] [
                ; If the result is too short, i.e., less items than 'size, add
                ; empty items to fill it to 'size.
                ; We loop here, because insert/dup doesn't copy the value inserted.
                if size > length? res [
                    loop (size - length? res) [add-fill-val]
                ]
            ]
            ; integer? dlm [
            ; ]
            'else [ ; = any [bitset? dlm  any-string? dlm  char? dlm]
                ; If the last thing in the series is a delimiter, there is an
                ; implied empty field after it, which we add here.
                case [
                    bitset? dlm [
                        ; ATTEMPT is here because LAST will return NONE for an 
                        ; empty series, and finding none in a bitest is not allowed.
                        if attempt [find dlm last series] [add-fill-val]
                    ]
                    char? dlm [
                        if dlm = last series [add-fill-val]
                    ]
                    string? dlm [
                        if all [
                            find series dlm
                            empty? find/last/tail series dlm
                        ] [add-fill-val]
                    ]
                ]
            ]
        ]
                
        res
    ]
]
(0003192)
Gregg
19-Jul-2011 20:46

A quick test func:

test: func [block expected-result /local res] [
    if error? try [
        print [mold/only :block newline tab mold res: do block]
        if res <> expected-result [print [tab 'FAILED! tab 'expected mold expected-result]]
    ][
        print [mold/only :block newline tab "ERROR!"]
    ]
]
(0003193)
Gregg
19-Jul-2011 20:47

And a few tests:

test [split "1234567812345678" 4]  ["1234" "5678" "1234" "5678"]

test [split "1234567812345678" 3]  ["123" "456" "781" "234" "567" "8"]
test [split "1234567812345678" 5]  ["12345" "67812" "34567" "8"]

test [split/into [1 2 3 4 5 6] 2]       [[1 2 3] [4 5 6]]
test [split/into "1234567812345678" 2]  ["12345678" "12345678"]
test [split/into "1234567812345678" 3]  ["12345" "67812" "345678"]
test [split/into "1234567812345678" 5]  ["123" "456" "781" "234" "5678"]

; Dlm longer than series
test [split/into "123" 6]       ["1" "2" "3" "" "" ""] ;or ["1" "2" "3"]
test [split/into [1 2 3] 6]     [[1] [2] [3] [] [] []] ;or [1 2 3]

test [split [1 2 3 4 5 6] [2 1 3]]                  [[1 2] [3] [4 5 6]]
test [split "1234567812345678" [4 4 2 2 1 1 1 1]]   ["1234" "5678" "12" "34" "5" "6" "7" "8"]
test [split first [(1 2 3 4 5 6 7 8 9)] 3]          [(1 2 3) (4 5 6) (7 8 9)]
test [split #{0102030405060708090A} [4 3 1 2]]      [#{01020304} #{050607} #{08} #{090A}]

test [split [1 2 3 4 5 6] [2 1]]                [[1 2] [3]]

test [split [1 2 3 4 5 6] [2 1 3 5]]            [[1 2] [3] [4 5 6] []]

test [split [1 2 3 4 5 6] [2 1 6]]              [[1 2] [3] [4 5 6]]

; Old design for negative skip vals
;test [split [1 2 3 4 5 6] [3 2 2 -2 2 -4 3]]    [[1 2 3] [4 5] [6] [5 6] [3 4 5]]
; New design for negative skip vals
test [split [1 2 3 4 5 6] [2 -2 2]]             [[1 2] [5 6]]

test [split "abc,de,fghi,jk" #","]              ["abc" "de" "fghi" "jk"]
test [split "abc
de
fghi
jk"
] ["abc" "de" "fghi" "jk"] test [split "a.b.c" "."] ["a" "b" "c"] test [split "c c" " "] ["c" "c"] test [split "1,2,3" " "] ["1,2,3"] test [split "1,2,3" ","] ["1" "2" "3"] test [split "1,2,3," ","] ["1" "2" "3" ""] test [split "1,2,3," charset ",."] ["1" "2" "3" ""] test [split "1.2,3." charset ",."] ["1" "2" "3" ""] test [split "-a-a" ["a"]] ["-" "-"] test [split "-a-a'" ["a"]] ["-" "-" "'"] test [split "abc|de/fghi:jk" charset "|/:"] ["abc" "de" "fghi" "jk"] test [split "abc^M^Jde^Mfghi^Jjk" [crlf | #"^M" | newline]] ["abc" "de" "fghi" "jk"] test [split "abc de fghi jk" [some #" "]] ["abc" "de" "fghi" "jk"]
(0003429)
abolka
6-Feb-2013 02:41

Gregg, that looks fantastic.

Seems one of the tests got messed up/misformatted by CureCode, but I can confirm that the rest of the tests pass fine with R3 A111.

Date User Field Action Change
6-Feb-2013 02:41 abolka Comment : 0003429 Added -
19-Jul-2011 20:47 Gregg Comment : 0003193 Added -
19-Jul-2011 20:46 Gregg Comment : 0003192 Added -
19-Jul-2011 20:45 Gregg Comment : 0003191 Modified -
19-Jul-2011 20:45 Gregg Comment : 0003191 Modified -
19-Jul-2011 20:44 Gregg Comment : 0003191 Modified -
19-Jul-2011 20:44 Gregg Comment : 0003191 Added -
19-Jul-2011 20:44 Gregg Comment : 0003190 Added -
8-Jul-2011 23:17 GHigley Ticket Added -