REBOL3 tracker
  0.9.12 beta
Ticket #0002189 User: anonymous

Project:

Previous Next
rss
TypeWish Statussubmitted Date4-Dec-2014 16:55
Versionr3 master CategoryParse Submitted byLadislav
PlatformAll Severityminor Prioritynormal

Summary Define a WHITESPACE charset
Description I think that it is useful to have it defined, it seems to be used frequently enough to justify the need.
Example code
whitespace: charset [#"^A" - #" " #"^(7F)" #"^(A0)"]

Assigned ton/a Fixed in- Last Update4-Dec-2014 23:21


Comments
(0004544)
fork
4-Dec-2014 23:21

(Hi Ladislav nice to hear from you, do check in on chat sometime if you have a moment...)

It's a very crucial idea to predefine character sets, especially when advocating for the ease of use of PARSE. There has been significant discussion on how to do it. The Unicode standard actually has character classes, and it would be desirable to be able to offer sets for them:

http://www.fileformat.info/info/unicode/category/index.htm

The concept of defining it as a function is a nice one; it would for instance allow `whitespace` to be meaningful as well as `whitespace/ascii`. It also allows the sets to be generated and cached on demand. You could use it in FIND or PARSE or whatever...

...however it will not work with PARSE unless PARSE allows function evaluation. I added it in a PR, it's certainly possible. But at one point I thought arbitrary evaluation with function parameters would be okay if the parameters wound up inline with parse dialect code. I now agree with Carl's feeling (and others) that only zero-arity functions be allowed inline in parse code. Under that premise this would be legal:

some-rule: function [/b] [
either b [[some "b"]] [[some "a"]]
]

parse "aaaabbbb" [some-rule some-rule/b]

While this would be rejected, and hit an error on the first attempt to use a non-zero-arity call:

some-rule: function [value [char!]] [
compose [some (value)]
]

parse "aaaabbbb" [some-rule #"a" some-rule #"b"]

I've written up a deeper rationale behind why this is not a loss of meaningful generality--with the benefit of not making PARSE rules any more nuts than they can get already. :-)

Surveys of our proposals for these classes can be found in chat search, so if you stop by we can dig up what those were. Offhand I believe we were going with `digit`, `letter`, `whitespace`, `symbol`...with refinements on each to do narrowing. so `letter/latin8/uppercase` would be more specific, while `letter` would be very general and match anything in the unicode spec that was a letter.

Date User Field Action Change
4-Dec-2014 23:23 Fork Comment : 0004544 Modified -
4-Dec-2014 23:22 Fork Comment : 0004544 Modified -
4-Dec-2014 23:21 Fork Comment : 0004544 Added -
4-Dec-2014 16:55 Ladislav Ticket Added -