REBOL3 tracker
  0.9.12 beta
Ticket #0002208 User: anonymous

Project:

Previous Next
rss
TypeWish Statussubmitted Date24-Mar-2015 09:45
Versionr3 master CategoryUnspecified Submitted byfork
PlatformAll Severityminor Prioritynormal

Summary Support named HTML5 entities table in escaping using ^[entity]
Description HTML5 has formalized named entities for unicode codepoints. About 1500 or so:

http://dev.w3.org/html5/html-author/charref

If you want to get `Foo ⊗ Bar`, it's certainly nicer to be able to type `{Foo ^[otimes] Bar}` than to have to dig up a table and find that is `{Foo ^(2297) Bar}`.

(It's also rather prettier than HTML's version of `Foo ô Bar`, I think.)

@MarkI pointed out that the original proposal to use ^(entity) would mean the entity &ac; would collide with existing ^(ac), which is already a valid string escape that is not ^(223E). As ^{ is necessary for escaping braces in strings and ^< will be necessary for escaping in tags, the only remaining choice is ^[entity].

Back-of-the-envelope calculation is that if you estimate 6 characters average per entity name, and 2 bytes for the UTF-16 codepoint, it's going to be 12K-ish for the data, uncompressed. The data would likely compress well with the existing "paid for" compression routines already in the code.

To claim "Unicode support", a feature like this would be very desirable...helping not only the writer, but all people coming down the line trying to read that code. Also, with this table built in it would help anyone trying to process HTML, because they could parse out the entity name and then convert it to a character:

ch: none
parse "&ocirc;" [
"&" copy entity-name to ";"
(ch: attempt [load combine [{'^[} entity-name {]}]])
]
either ch [
print [{Entity} entity-name {is equivalent to} ch]
] [
print [
if/only entity-name [{Entity} entity-name {is}]
{not a valid HTML5 entity}
]
]

So there's an extra-super cool reason to be compatible and include the table. It may be desirable, however, to offer an API so that the reverse can be done...to turn a character into an entity name (if available).
Example code

			

Assigned ton/a Fixed in- Last Update24-Mar-2015 11:46


Comments
(0004615)
MarkI
24-Mar-2015 10:23

Aww ... so *close*. Sadly, this'll break compatibility: ^(ac) is already a valid string escape that is not ^(223E).

Date User Field Action Change
24-Mar-2015 11:46 fork Summary Modified Support named HTML5 entities table in escaping => Support named HTML5 entities table in escaping using ^[entity]
24-Mar-2015 11:46 fork Description Modified -
24-Mar-2015 10:23 MarkI Comment : 0004615 Added -
24-Mar-2015 10:12 Fork Description Modified -
24-Mar-2015 09:45 Fork Ticket Added -