REBOL3 tracker
  0.9.12 beta
Ticket #0002234 User: anonymous

Project:

Previous Next
Short URL: http://issue.cc/r3/2234
rss
TypeWish Statussubmitted Date2-Aug-2015 05:11
Versionr3 master CategoryUnspecified Submitted byfork
PlatformAll Severitymajor Prioritynormal

Summary SuperTAG!: upon a " { ( [ < in tag content, validate substring via Rebol lexer
Description Make a natural tag start with < followed by any character that is not a space, <, ~, or =. It ends with a character that is not a space, >, ~, or = followed by a >. (There may be other prohibitions on what the second and penultimate character can be, and it may be that <> is reclaimed for "empty tag".)

Content is processed by general string rules with string escaping, possibly adding ^> and ^<.

The exception is when the lexer sees a " { ( [ or < inside the tag. When it does, it calls out to the main Rebol lexical rules and requires valid Rebol strings/parens/blocks/tags to be formed. Notably, slashes are *not* processed according to path rules except once the lexer has been switched on by one of these recognitions. See example code below for legal and illegal natural tags, as CureCode chokes on HTML tags I believe.

This embraces the difference in treatment of apostrophes from double quotes in the current Rebol TAG! implementation as "a feature". It means you can write contractions like "it's" in HTML comments or otherwise without special treatment. Hence apostrophes may still be used in attribute values, but they will not protect unpaired X> from signaling a tag ending or <X from signaling a tag beginning.

Given that TAG! frequently has structural intent, this opens doors to augmenting the structural validation of strings, while keeping them as strings. It avoids creating new specifications or much code to do so, as it leverages the sunk cost of the existing Rebol lexer.

**THIS IS SPECIFICALLY TO AVOID THE SLIPPERY SLOPE OF CATERING TO OTHER LANGUAGES LEXICAL NEEDS**, while still accommodating a wide swath of tag! naturals.

Consider today's feature that allows:

<foo attr="a > b">

It matches quotes, and hence avoids ending the tag. But once you are reaching inside and try to match quotes for a language, you must consider what escaping features it has. How do you know that in the language you are embedding &"& isn't a way of indicating a lone quote literal? You wind up with <% ch = &"& %> and not closing the tag.

Of course, *often* this will be backslashes or other things. And it would be technically possible to study and figure out what exactly the set of PHP and HTML comment properties are, such as to permit the (apparently legal) <!-- you're ------> kidding me --> tag. With SuperTAG!'s bias, the feature set converges on Rebol while still allowing a *lot* of tag naturals for other languages and comments.

In the "just say no" department, this is about "Just say no" to searching for <% and %> and having logic inside rebol.exe to do something different like look for backslashes or play in that field, or handle <!-- and --> differently. At the same time it is not making it impossible to embed most of most languages. And it's making it easier to embed one language in particular: Rebol.

Perhaps a little tougher to write <!-- Y'know what site is ridiculous? (Hint: it's 4chan.) --> A problem quickly solved with <!--{ Y'know what site is ridiculous? (Hint: it's 4chan.) }-->
Example code
;--
;-- Legal "natural" tags
;--

<foo bar="abc">

;-- space before > means we know it doesn't end tag
<% if 1 > 2 [print "Uh oh"] %>

;-- bar> not seen as a tag ending because it's in quotes
<foo attr="1 bar> 2" /> 

;-- <bar not seen as a tag beginning because it's in quotes
<foo attr="1 <bar 2" /> 

;-- Matched pairings
<([])>

<!-- Don't need to escape <tags> if they are naturals -->

<!-- Don't need to escape < non tags > if they aren't tags -->

;--
;-- Illegal "natural" tags...construction syntax required
;--

<foo bar="abc"    > ;-- space then > does not terminate

<   foo bar="abc"> ;-- < then space does not open a tag

<foo bar='1 bar> 2' /> ;-- single quotes aren't "super"

<foo <bar /> ;-- unclosed "foo tag", the /> closes bar.

<)][(> ;-- open/closes don't match

<([)]> ;-- improper nest

<the site is [4chan]> ;-- Rebol word rules apply in blocks

<foo bar=(if 4chan > stackoverflow [print {Uh oh.}]) baz={string}> ;-- same rules for parens

Assigned ton/a Fixed in- Last Update3-Aug-2015 17:42


Comments

Date User Field Action Change
3-Aug-2015 17:42 Fork Description Modified -
3-Aug-2015 17:41 Fork Description Modified -
3-Aug-2015 17:27 Fork Code Modified -
3-Aug-2015 17:26 Fork Description Modified -
3-Aug-2015 17:26 Fork Code Modified -
2-Aug-2015 05:29 Fork Description Modified -
2-Aug-2015 05:27 Fork Code Modified -
2-Aug-2015 05:27 Fork Code Modified -
2-Aug-2015 05:27 Fork Description Modified -
2-Aug-2015 05:27 Fork Code Modified -
2-Aug-2015 05:11 Fork Ticket Added -