REBOL3 tracker
  0.9.12 beta
Ticket #0002209 User: anonymous

Project:

Previous Next
rss
TypeIssue Statussubmitted Date24-Mar-2015 12:23
Versionr3 master CategorySyntax Submitted byfork
PlatformAll Severitymajor Prioritynormal

Summary Character Escaping Plan "Epic"
Description This is an "Epic" in training, to look holistically at the escaping picture. If any individual proposal becomes controversial to break out into its own issue (and hasn't been already), then the description will be deleted here and link to that new issue.

----

Often for escaping within braced strings, ^{ and ^} are available. These are available in any escaped context...inside of any string type, or any word. Hence ^{X} is not available to describe any form of escaping.

Often for escaping within tags, ^< and ^> are available. Again, they can be used to escape anywhere that escaping is available. Hence ^<X> is not available to describe any form of escaping for X

Often for escaping within quoted strings, ^" is available. It is also available in any escaped context, and rules out ^"X" as being a composite escape sequence.

^(DD) and ^(DDDD) have already been defined for codepoints where D is hexadecimal digits. ^(DDDD..) is reserved for future use to access higher codepoints when supported. (Red supports unicode above UTF-16, though exchanges all unicode data via UTF-8)

#2208 proposes ^[entity] as a way of encoding entities from the Unicode named character entities in the HTML5 specification, and explains why it does not extend the historical Rebol-named escapes. This also ensures that the names defer to the HTML5 spec and do not attempt to extend it; making it a good test of valid HTML5 escaping.

^A through ^Z are used to indicate control sequences. There are no known situations in which a letter would need to be escaped, nor are any predicted to happen (counter-examples?)

NEW: ^0 thru ^9 would escape to be the named digit. Hence:

^3com-driver: %network-3com.drv
print ^3com-driver

^_ would be the notation for escaping spaces, as proposed in #2196.

^| would be added as a version of escaping a newline that could be used either inside or outside of strings, as remarked in the comments of #2203. Inside of strings only, ^/ would likely be retained for backwards compatibility. (I'd personally prefer the consistency of not having it, but it's not a big deal as those who want to avoid it can always use ^|)

^- would continue escaping tabs, as it does today.

^ followed by a space or newline could be a special form of escape indicating the visible end of a spacing when no newline is actually intended. This could be a visual aid in multi-line string literals containing space. Hence:

lots-of-spaces: {^
     ^
     ^
     ^
}

Given that spaces are invisible, it would be difficult to notice the quantity of spacing otherwise based on inspection. (This is also a strong argument for demanding that tabs in strings be escaped, such that it be possible to know for sure what you were looking at without worrying about "invisibles")

^(X) would be used to escape any symbol that is not legal at the start of a WORD! The reason is to keep things clear, as if it were allowed that ^@foo be a valid escape of the @ to get an @ character, then people might assume that ^_foo is a literal underscore and not a space. ^(@)foo helps systemize this, and gives freedom for ^@ to mean something else if necessary

This brings us to the big "Minus Four": how to escape [ ] ( )

^([) and ^(]) are pretty reasonable. But ^(() and ^()) look a little dodgy. Still, it's an escape pattern. ^^ is confusing too. These shouldn't come up too often, hopefully...and you'd always have the option of using the hex codepoints (as you would for any of these).
Example code

			

Assigned ton/a Fixed in- Last Update24-Mar-2015 12:24


Comments

Date User Field Action Change
24-Mar-2015 12:24 fork Description Modified -
24-Mar-2015 12:24 fork Description Modified -
24-Mar-2015 12:23 fork Ticket Added -