Type | Bug | Status | reviewed | Date | 19-Jul-2015 18:41 |
---|---|---|---|---|---|
Version | r3 master | Category | Unspecified | Submitted by | fork |
Platform | All | Severity | minor | Priority | normal |
Summary | LENGTH? ANY-WORD! reports byte length of UTF8 encoding, not character count |
---|---|
Description |
Currently there is a bug in the action A_LENGTHQ of words where it gives back the length in bytes of the UTF-8 string instead of the length of the actual word. https://github.com/rebol/rebol/blob/25033f897b2bd466068d7663563cd3ff64740b94/src/core/t-word.c#L86 While it could be changed to do a decoding and give back the unicode length, there is a question of what exactly the intent was. Should `length? quote foo:` include the colon or not, e.g. be 3 or 4? So rather than changing this to a character count, disallowing it entirely seems the better path. That way people can specify what they meant by the kind of string conversion they do: To not include the marker, you would use `spelling-of` (currently in rebol-proposals, to be incorporated soon into Ren/C). `length? spelling-of quote foo:` is 3. If you want to include the marker you would use `to-string` (which is currently conceived in rebol-proposals to behave about like FORM does today). So `length? to-string quote foo:` would be 4. |
Example code |
;; This is the concrete bug (semantic problems aside) >> length? to-word to-string to-char 126 == 1 >> length? to-word to-string to-char 128 == 2 |
Assigned to | n/a | Fixed in | - | Last Update | 26-Jul-2015 06:26 |
---|
Date | User | Field | Action | Change |
---|---|---|---|---|
26-Jul-2015 06:26 | abolka | Summary | Modified | ANY-WORD! reports byte length of UTF8 encoding, not character count => LENGTH? ANY-WORD! reports byte length of UTF8 encoding, not character count |
26-Jul-2015 06:25 | abolka | Status | Modified | submitted => reviewed |
20-Jul-2015 17:30 | Fork | Code | Modified | - |
20-Jul-2015 17:29 | Fork | Code | Modified | - |
20-Jul-2015 17:29 | Fork | Description | Modified | - |
20-Jul-2015 17:28 | Fork | Description | Modified | - |
19-Jul-2015 18:42 | Fork | Description | Modified | - |
19-Jul-2015 18:42 | Fork | Code | Modified | - |
19-Jul-2015 18:41 | Fork | Ticket | Added | - |