REBOL3 tracker
  0.9.12 beta
Ticket #0002224 User: anonymous

Project:

Previous Next
rss
TypeBug Statusreviewed Date19-Jul-2015 18:41
Versionr3 master CategoryUnspecified Submitted byfork
PlatformAll Severityminor Prioritynormal

Summary LENGTH? ANY-WORD! reports byte length of UTF8 encoding, not character count
Description Currently there is a bug in the action A_LENGTHQ of words where it gives back the length in bytes of the UTF-8 string instead of the length of the actual word.

https://github.com/rebol/rebol/blob/25033f897b2bd466068d7663563cd3ff64740b94/src/core/t-word.c#L86

While it could be changed to do a decoding and give back the unicode length, there is a question of what exactly the intent was. Should `length? quote foo:` include the colon or not, e.g. be 3 or 4?

So rather than changing this to a character count, disallowing it entirely seems the better path. That way people can specify what they meant by the kind of string conversion they do:

To not include the marker, you would use `spelling-of` (currently in rebol-proposals, to be incorporated soon into Ren/C). `length? spelling-of quote foo:` is 3.

If you want to include the marker you would use `to-string` (which is currently conceived in rebol-proposals to behave about like FORM does today). So `length? to-string quote foo:` would be 4.
Example code
;; This is the concrete bug (semantic problems aside)

>> length? to-word to-string to-char 126 
== 1

>> length? to-word to-string to-char 128 
== 2

Assigned ton/a Fixed in- Last Update26-Jul-2015 06:26


Comments

Date User Field Action Change
26-Jul-2015 06:26 abolka Summary Modified ANY-WORD! reports byte length of UTF8 encoding, not character count => LENGTH? ANY-WORD! reports byte length of UTF8 encoding, not character count
26-Jul-2015 06:25 abolka Status Modified submitted => reviewed
20-Jul-2015 17:30 Fork Code Modified -
20-Jul-2015 17:29 Fork Code Modified -
20-Jul-2015 17:29 Fork Description Modified -
20-Jul-2015 17:28 Fork Description Modified -
19-Jul-2015 18:42 Fork Description Modified -
19-Jul-2015 18:42 Fork Code Modified -
19-Jul-2015 18:41 Fork Ticket Added -