Ticket #0001976

Type	Issue	Status	reviewed	Date	28-Feb-2013 01:00
Version	r3 master	Category	Datatype	Submitted by	adrians
Platform	All	Severity	minor	Priority	normal

Summary	Revisit, refactor or rename tuple!
Description	In looking over the tuple! datatype documentation, is seems that it is quite limited given that it has a very general sounding name. From the R2 documentation: "It is common to represent version numbers, Internet addresses, and RGB color values as a sequence of three or four integers." Some examples are given there (my comments added): 1.3.0 2.1.120 1.0.2.32 ; version - quite a few versions you find these days have components greater than 255, the current max value (e.g. Chromium version 27.0.1424.0) 199.4.80.250 255.255.255.0 ; net addresses/masks - with IPv6, address tuples are of the form 2001:0db8:85a3:0000:0000:8a2e:0370:7334 with empty sections using :: notation, allowing for 2001:db8:85a3::8a2e:370:7334, for example 0.80.255 200.200.60 ; RGB colors As evident, only the color tuple definition is still fully applicable. To a degree, the current state just reflects the passage of time with new specifications/standards coming into use, so it might make sense to re-think what a tuple should be useful for. One possibility would be to rename this datatype to colortuple! after removing mention of the other uses suggested for it. The other currently suggested uses might make sense as independent, new, datatypes. Exactly how tuples could be made more useful is something that I hope can be debated. The intent here is to get the conversation started.
Example code

Assigned to	n/a	Fixed in	-	Last Update	24-Apr-2013 23:24

Comments
(0003534) BrianH 28-Feb-2013 02:12	OK, here is where we run into the legacy naming rules. If a named something exists in R2 in a substantially compatible way, we keep the name in R3. The only exception we make is for something that is both extremely badly named and almost completely unused, like #1971 for example and maybe #1973. The tuple! type is both very widely used and somewhat accurately named - being not-general-enough in semantics to match the generality of the name is not a sufficient excuse, and it is a tuple even if it is a somewhat limited one. The converse is also true: If we are introducing a new datatype in R3, we don't use the name of an old datatype in R2 unless they are sufficiently similar in their semantic model (which is why map! isn't called hash!) or can fake it well enough with some adjusted actions (issue!, though its actions still need adjusting). For that reason, if we were to reintroduce a hash! datatype in R3 it would necessarily have to be substantially compatible with the R2 type of that name (see #1494), or else it would need a new name (see #1774). And if we reintroduced list! it would need to be compatible too, right down to its weird INSERT return position. We could probably change rebcode! though since it was never in a non-experimental release. For cases like this, even if we add a new name for the thing, we would need to keep the old name and add the new name as a synonym. But for datatypes this is a problem, because for at least 3 built-in dialects datatypes are keywords, not a name that can be assigned. And then there is the TYPE? function, which can only return a single datatype or name - for backwards compatibility the name it would have to return is the legacy name, not the new name. With these considerations, there is no point in ever renaming any datatype that has any significant use in existing code. As for your semantic concerns, you are missing the most important Rebol-specific characteristic: It is an "immediate" value, which means that it fits into a value slot. A value slot in R3 is 128 bits, with 32 bits allocated for flags and metadata, which leaves 96 bits for payload. The difference between immediate and non-immediate values in Rebol is really significant. Immediate values are copied completely with every assignment, and are arguably immutable (let's not rehash that argument here, please), and you can't alias them without aliasing their entire value slot. The non-immediate portion of non-immediate values are effectively accessed by reference, and those references can be aliased, and that means that changes to their data can affect multiple references (this is what I meant by mutable earlier). In order for tuple to be considered OK to change, it would need to continue to be treatable as an immediate pseudo-series of unsigned numbers. So that means that we would not be able to use it for IPv6 addresses (assuming that some year from now we get any IPv6 support at all) because the 128-bit IPv6 addresses won't fit into the 96 bits available in a value slot. So, it probably won't matter that the syntax is different too because these addresses would need to be stored in a real series or vector type. As for your other arguments, you make a good point, but it's worse than you think: RGB colors aren't necessarily limited to that color depth anymore either. However, if we increase the size of the tuple elements to 16 bit that would increase the range of possible numbers. The cost would be that we couldn't store as many of them - that 96 bits is basically split into 80 bits for 10 elements, and 16 bits for the length of the tuple. That 16 bits is more than we need so we could in theory use 8 bits for the length and 8 bits for the element size (1 or 2 bytes). If we had variable-length tuple elements then we could have up to either 10 8-bit or 5 16-bit numbers (or I suppose 2 32-bit numbers but we can't represent a 2-element tuple in its syntax). That change would increase our flexibility, at a possible cost in overhead for code that varies on the element size for some operations.
(0003535) adrians 28-Feb-2013 02:59	Hmm, with all of these considerations in mind, it's difficult to see how to reasonably evolve datatypes. If an initial design is this locked down once it's out "in the wild", maybe we should think of some kind of better approach to datatype evolution. It seems pretty clear that there should exist a mechanism for doing so. Btw, I'm not sure what you mean by IPv6 not being implemented. Have you looked at the output of "ipconfig" in Windows? All my network devices are shown with IPv6 addresses (along with IPv4) and are reachable with these. My feeling is that that with the current implementation, the name claims a wide set of possible uses, but delivers relatively little value. Since we make a big deal of all the datatypes supported by Rebol you would hope that we could improve them over time if warranted. If not, then at the very least we should rename them to reflect a more limited scope. If we can't even do this, then let's make sure the docs reflect the limited scope so that people don't have inflated expectations.
(0003536) BrianH 28-Feb-2013 04:12	We can improve datatypes in Rebol, but in a backward compatible way for the most part. When we break backwards compatibility we only do so for good reasons, and we try to limit the scope of the break. If we're really going to break things, it's much better to break them altogether by removing the datatype completely. Changing the datatype in subtle ways leads to subtle bugs - loud bugs are better. It we really need a different datatype we can give it a new name. Backwards compatibility isn't of supreme importance for R3 (see #666) but we can't ignore it completely. We are hoping to make it at least possible to port code over from R2, and maybe even make a backwards-compatibility layer/module, because we lose a lot of the justification for having a Rebol-like language at all if we completely alienate all of our existing users. That does mean that we are trying to manage the change, rather than just throwing away all prior Rebol knowledge and code. This isn't a 1.0, we're building a 3.0 language. However, you are overestimating what needs datatype support. Every new datatype we add reduces the potential flexibility of the language, especially if that type has a literal syntax. And if it doesn't have a new literal syntax, there is even less justification for adding a datatype. You only have to add a new datatype if you need semantics (including action behavior) that is sufficiently different from another datatype with similar syntax that you can't get away with just extending the syntax/behavior of that other type (see #1962 for an example where we're probably better off just extending another datatype). For datatypes that don't need any literal syntax at all beyond construction specs (#1955) we might even be able to get away with doing them as user-defined datatypes once we have those. For those cases, all we really need is to be able to add a new semantic model that would be acted on by the action functions. Some of these cases can even be implemented as a port scheme, and you don't even have to make a datatype. User-defined datatypes and port schemes are great because they're optional. If you don't even need to be able to use the standard action functions, and don't need to add new literal syntax, you don't even need to make a datatype at all. You're really better off with adding a module that wraps around some arrangement of the datatypes we already have. That approach has the advantage of not allocating one of our limited set of datatypes, not constraining the data syntax in a way that would preclude some other datatype from existing, and not adding something to the core that can't easily be removed - that's what I meant by decreasing potential flexibility.
(0003537) BrianH 28-Feb-2013 04:12	I mean that there is no support whatsoever in any version of Rebol for IPv6. Whether it is implemented on the host platform doesn't affect whether it is implemented in Rebol (though if it's not, that might block us a little). It's even a different set of APIs on some platforms. That doesn't mean it's not a good idea to do though. Given all that I said above about datatypes, I think that there is good justification for adding an IPv6 address datatype. It needs new literal syntax that is different enough from our existing syntax that we are more likely to be able to add it without conflicts. Of course the main reason it is unlikely to conflict is because of how ugly the syntax is, but that's not our fault. It's really critical that it be included because it is of core networking importance (or will be). The syntax and data model are based on a rigorous standardization process that has been pretty widely agreed upon (if not yet as widely implemented), at least as far as Rebol datatype semantics are concerned, and is likely to stick around for generations (based on IPv4's persistence). It needs to be operated on by the actions, and probably in a way that is specific enough to this type that we won't be able to get away with using a binary or vector to store it. For that matter, given that it's 128-bit we are probably going to want to allow the vector type to store these in bulk, because we won't be able to fit them into block value slots - that's unlikely to work with a user-defined datatype, barring a rethink of the vector model. So I'm all for making an IPv6 address datatype and extending R3 to support IPv6 otherwise. Someone should make a bunch of related tickets.
(0003538) BrianH 28-Feb-2013 04:13	Yes, docs. Coming into a new language everyone's expectations are going to be off because they aren't based on experience with this language. They'll be even more off if they come from a background where they would have heard a term like "tuple" before (Chrome's spell-checker hasn't even heard of "tuple"), because Rebol is decidedly quite different from anything such a background would have led them to experience. For instance, what most languages that have tuples call "tuples", Rebol calls "blocks" (which is what some other languages call anonymous functions, so we can't even win there). So, docs are critical.
(0003542) abolka 28-Feb-2013 22:26	Don't know how this got sidetracked into IPv6, but: IPv6 address literals won't come so easy, they are in latent conflict with the current url! and get-word! syntax: a::b is a valid IPv6 address, but is currently a valid url!; ::a is a valid IPv6 address, but currently a valid get-word!. As IPv6 addresses have other intricacies, I would suggest to not attempt to model them as literal datatypes, but rather just use the existing url! type to express IPv6 addresses (ip6://). I can extract that suggestion into a separate "note" ticket or similar, if that is deemed helpful.
(0003543) BrianH 28-Feb-2013 22:48	It's an issue ticket, those tend to go down interesting paths. In this case Adrian started it in the ticket description. :) Good catch on the syntax conflict though, and good suggestion to just use url! instead. As a bonus that can be implemented as a port scheme instead of as a datatype. A native scheme can even be integrated into the tcp:// and udp:// schemes when needed. For that matter, we might adjust the URL syntax parsers of those schemes to support IPv6 addresses directly, as long as the syntax doesn't require [ and ] around the address. Although without the [ and ] I'm not sure how we'd resolve the syntax conflict with port numbers in tcp and other network url specs, so that needs more thought. It might also be possible to do the same for email syntax without creating a new datatype for that either. If there's a conflict there, we won't, and it seems likely that there will be a conflict if the square brackets described here are required: http://en.wikipedia.org/wiki/Email_address#Valid_email_addresses You have my support if you create the ticket, as a Wish if it involves changing existing datatypes or url! schemes. We can discuss any syntax issues there.
(0003839) Ladislav 22-Apr-2013 16:05	"The converse is also true: If we are introducing a new datatype in R3, we don't use the name of an old datatype in R2 unless they are sufficiently similar in their semantic model" - that is actually not true, for example the money! datatype in R3 is semantically different from the R2 (using 96 bits vs 64 bits, decimal base vs. binary base, no denomination vs. denomination, ...). Also, the integer! datatype is semantically different using a much greater range of values in R3.
(0003840) Ladislav 22-Apr-2013 16:09	"96 bits is basically split into 80 bits for 10 elements, and 16 bits for the length of the tuple" - not true, at the time of this writing line 250 in sys-value.h defines the tuple length to use just one byte
(0003841) Ladislav 22-Apr-2013 17:11	"I would suggest to not attempt to model them as literal datatypes, but rather just use the existing url! type to express IPv6 addresses (ip6://)" - I do not think it is wise to use a prefix like ip6://. At present http://1.2.3.4 is a valid url containing an IPv4 address. I assume it would be possible to insert IPv6 address in a url somehow either? Also, having the tuple! datatype for IPv4 addresses we make a difference between a url like http://1.2.3.4 containining an IPv4 address and just the IPv4 address as a special type value ...
(0003843) BrianH 22-Apr-2013 22:23	Ladislav, you missed the sufficiently part of "sufficiently similar". The money! and integer! types are different, but they are still numbers, you can still do math with them, they are still immediate values. For that matter there is still a proposal to restore the denomination, for better or worse. They are close enough to be considered sufficiently similar. "at the time of this writing line 250 in sys-value.h defines the tuple length to use just one byte" - Good, that means we already have an unused byte available. Technically, we could even do this if we didn't have a spare byte available, because one byte is more than we need to describe all possible tuple! lengths, so we could use up to 4 bits for flags, or have a separate set of length values to refer to two-byte-element tuple lengths. "I assume it would be possible to insert IPv6 address in a url somehow either?" - yup, HTTP and the other network schemes that could have IPv6 support would have to have their syntaxes extended as well. Then you could have the raw address be detected based on the scheme, or internally a 16-byte binary or something. It's doable.
(0003844) Ladislav 23-Apr-2013 01:29	Eventually, going to the 64-bit "space" we may consider what changes to Rebol values should that bring. As I see it, "Rebol value size" should be derived from the need to represent Rebol series. Rebol series need at least datatype info, one pointer and one index. Going to 64-bit "space" both index and pointer should naturally be 64-bit, which, together with the datatype info and 64-bit alignment yields 192 bits per Rebol value (the value size immediately succeeding 128 bits when there is a requirement for values to be 64-bit aligned). (using 192 bits per seris may, however, leave 32 bits of unused alignment space per Rebol series value) If preferring to keep Rebol value size at 128 bits even when going to the 64-bit "space" we would have to use "smaller than 64-bit" index values. However, it would not be necessary to use 32-bit indices, it might be proven possible to use as big as 56-bit indices... There are values that would not benefit from Rebol value size enlargement like: chars, dates, decimals, integers, logic values, nones, pairs (not sure about this), percents, times (not sure about this), unsets. On the other hand, datatypes like money!, would be able to add denominations and we would even be able to use IPv6 addresses as tuples.
(0003845) Ladislav 23-Apr-2013 01:34	This is how IPv6 addresses will be used in URL's: https://[2001:0db8:85a3:08d3:1319:8a2e:0370:7344]:443/ (443 is port number) This format suggests that #"[" should be preferrably preceded by space if used as a block start, while #"]" should be followed by a space if used as a block end (#"[" and #"]" should be "less delimiting")
(0003846) abolka 23-Apr-2013 02:19	If I'm not mistaken, how to embed IPv6 addresses within URLs is scheme-specific. It looks like HTTP's [...] method will gain traction, but I'm not sure if there aren't already other methods in use. In any case, I see no particular conflict between the use of IPv6 adresses in the hostname portion of URLs and the use of one synthetic URL-scheme in R3 (ip6://) to represent IPv6 addresses themselves. We'd still have a difference between an URL like http://[2001:db8::1]/ and just the IPv6 address ip6://2001:db8::1, just like we have with IPv4. It's only that we wouldn't use the tuple! datatype to represent IPv6 addresses, but the url! datatype instead. That doesn't conflict with other uses of the url! datatype, or even with one class of url! values (like HTTP urls) logically embedding another class of url! values (IPv6 addresses). The resolution of this embedding would have to be handled by the URL parser, just as it is now.
(0003847) BrianH 23-Apr-2013 02:22	Maybe we would only need to make [ and ] less delimiting when we are in the process of loading url! values, not in general.
(0003849) Ladislav 23-Apr-2013 09:15	"If I'm not mistaken, how to embed IPv6 addresses within URLs is scheme-specific. " - does not look like that: http://tools.ietf.org/html/rfc3986
(0003850) Ladislav 23-Apr-2013 09:20	"I see no particular conflict between the use of IPv6 adresses in the hostname portion of URLs and the use of one synthetic URL-scheme in R3 (ip6://) to represent IPv6 addresses" - hmm, probably the worst conflict is that we cannot enforce different syntactic rules at the Load time for the specific case. Nor we can enforce any rules for editing.
(0003851) BrianH 23-Apr-2013 19:16	"probably the worst conflict is that we cannot enforce different syntactic rules at the Load time for the specific case" - We won't have to at load time, because LOAD can already handle that ip6://2001:db8::1 syntax without the brackets, and there shouldn't be any problem with also allowing it to be specified as ip6://[2001:db8::1] as long as the decoding function can handle it. We would only need to handle the conflict in DECODE-URL, which should be easy. The only trick is that you wouldn't be able to specify a port number without using the brackets around the address. In any case, opening an ipv6:// address should just fail with a similar error to that of an unimplemented scheme. Can you specify an IPv6 address with only one : character? If not, we should be able to auto-detect IPv6 addresses outside of the brackets in url! values of any scheme (if we want to relax things) and just not specify a port number in that case.
(0003852) Ladislav 24-Apr-2013 21:35	"We won't have to at load time, because LOAD can already handle that ip6://2001:db8::1 syntax without the brackets, and there shouldn't be any problem with also allowing it to be specified as ip6://[2001:db8::1] as long as the decoding function can handle it." - well, I see that as underestimating the problem, though. The fact is that e.g. ip6://2001::ax is a valid url! at present and we cannot make it invalid, in my opinion. Therefore, seeing an ip6://... we cannot be sure it is IPv6 anyway unless checking it is IPv6 every time it is used.
(0003853) BrianH 24-Apr-2013 22:02	It's worse than that: We can't assume that any url! is valid until we validate it, and since it might have changed since we last validated it we will need to validate it again before it is used. At the moment, the function that does this is DECODE-URL. Also, we can't use IPv6 addresses at all without converting them to a 16-byte binary value first. That means that ip6:// form addresses will need to be parsed by a (hopefully built-in and native) function which generates such a binary value, or possibly triggers an error if the ipv6:// syntax was bad. DECODE-URL would probably need to generate a similar binary value to be the real address if IPv6 addresses were in url! values of other schemes, likely calling the same function internally. Then, code that actually needs to use the IPv6 addresses would actually work with 16-byte binary values with no syntax whatsoever. As with all url! values, you get a tradeoff: You can build the value incrementally, in return for not being able to trust or use the value until you verify and/or decode it.
(0003854) Ladislav 24-Apr-2013 22:23	This whole discussion looks quite strange to me. As said by me and not questioned it would be necessary to check if it is IPv6 every time it is used, which would not be a trivial operation. If wanting to use an existing datatype it would be much easier to adjust LOAD to yield a binary! when encountering a valid IPv6 syntax. A binary! can be trivially checked for IPv6 validity, e.g. as follows: ipv6?: func [value [binary!]] [16 = length? value]
(0003855) BrianH 24-Apr-2013 22:48	(For the moment let's ignore that this has nothing to do with tuples.) If you can come up with an IPv6 literal syntax that could be made to not conflict with other literal syntax, and which generates a 16-byte binary value (which may or may not get the binary! type assigned to it), then that would do. One way to do this would be to just allow the raw address syntax (without the [ and ]), as long as you can properly distinguish it from a time! literal (can we?). If we use a datatype other than binary! we would be able to MOLD that syntax back too, and wouldn't have to length-check the value, but that probably doesn't matter that much. Nonetheless, we would also benefit from tweaking the literal syntax of url! values so it will allow IPv6 addresses to be specified in them (the [ and ] bracket thing above). We would also benefit from a function that took a url! with an IPv6 address in it and generated one of those IPv6 binary values, regardless of the scheme. Then the ipv6:// scheme would just be a writing convention, not the real syntax.
(0003856) abolka 24-Apr-2013 23:11	As mentioned in my first comment above, IPv6 literal syntax is in latent conflict with url! and get-word! literal syntax: a::b is a valid IPv6 address, but is currently a valid url!; ::a is a valid IPv6 address, but currently a valid get-word!. Maybe there are other conflicts as well. Also note, that IPv6 addresses may need to be qualified with a "zone index" to be routeable. The suggested notations for a zone index is to append it to the address separated with a %. The syntax for a valid zone index itself is OS-specific. Examples: fe80::1%1, fe80::1%wlan0. (Obviously, this notation conflicts with percent encoding in url!s.) You'll have to store the zone index along with the address. For some usage scenarios you can get by with storing the zone index pre-decoded into a 32-bit integer; for other scenarios, you may have to keep the original, literal zone index specification around.
(0003857) Ladislav 24-Apr-2013 23:24	"... allow the raw address syntax (without the [ and ]), as long as you can properly distinguish it from a time! literal (can we?)" - Yes. IPv6 syntax shall either contain 7 #":" characters, which a time literal can't, or it has to contain two subsequent #":", which a time literal can't either

Date	User	Field	Action	Change
24-Apr-2013 23:24	Ladislav	Comment : 0003857	Added	-
24-Apr-2013 23:12	abolka	Comment : 0003856	Modified	-
24-Apr-2013 23:11	abolka	Comment : 0003856	Added	-
24-Apr-2013 22:48	BrianH	Comment : 0003855	Added	-
24-Apr-2013 22:23	Ladislav	Comment : 0003854	Added	-
24-Apr-2013 22:04	BrianH	Comment : 0003851	Modified	-
24-Apr-2013 22:03	BrianH	Comment : 0003853	Modified	-
24-Apr-2013 22:02	BrianH	Comment : 0003853	Added	-
24-Apr-2013 21:36	Ladislav	Comment : 0003852	Modified	-
24-Apr-2013 21:35	Ladislav	Comment : 0003852	Modified	-
24-Apr-2013 21:35	Ladislav	Comment : 0003852	Added	-
23-Apr-2013 19:25	BrianH	Comment : 0003851	Modified	-
23-Apr-2013 19:16	BrianH	Comment : 0003851	Added	-
23-Apr-2013 13:12	Ladislav	Comment : 0003850	Modified	-
23-Apr-2013 09:20	Ladislav	Comment : 0003850	Added	-
23-Apr-2013 09:15	Ladislav	Comment : 0003849	Added	-
23-Apr-2013 02:22	BrianH	Comment : 0003847	Added	-
23-Apr-2013 02:19	abolka	Comment : 0003846	Added	-
23-Apr-2013 01:34	Ladislav	Comment : 0003845	Added	-
23-Apr-2013 01:29	Ladislav	Comment : 0003844	Added	-
22-Apr-2013 22:23	BrianH	Comment : 0003843	Added	-
22-Apr-2013 18:47	Ladislav	Comment : 0003840	Modified	-
22-Apr-2013 17:11	Ladislav	Comment : 0003841	Added	-
22-Apr-2013 16:17	Ladislav	Comment : 0003839	Modified	-
22-Apr-2013 16:09	Ladislav	Comment : 0003840	Added	-
22-Apr-2013 16:05	Ladislav	Comment : 0003839	Added	-
1-Mar-2013 23:08	BrianH	Comment : 0003543	Modified	-
1-Mar-2013 00:40	BrianH	Comment : 0003543	Modified	-
1-Mar-2013 00:30	BrianH	Comment : 0003543	Modified	-
1-Mar-2013 00:26	BrianH	Comment : 0003543	Modified	-
1-Mar-2013 00:21	BrianH	Comment : 0003543	Modified	-
1-Mar-2013 00:21	BrianH	Comment : 0003543	Modified	-
28-Feb-2013 23:23	BrianH	Comment : 0003543	Modified	-
28-Feb-2013 23:17	BrianH	Comment : 0003543	Modified	-
28-Feb-2013 23:08	BrianH	Comment : 0003543	Modified	-
28-Feb-2013 22:50	BrianH	Comment : 0003543	Modified	-
28-Feb-2013 22:48	BrianH	Comment : 0003543	Added	-
28-Feb-2013 22:26	abolka	Comment : 0003542	Added	-
28-Feb-2013 04:29	BrianH	Comment : 0003537	Modified	-
28-Feb-2013 04:28	BrianH	Comment : 0003537	Modified	-
28-Feb-2013 04:19	BrianH	Comment : 0003536	Modified	-
28-Feb-2013 04:16	BrianH	Comment : 0003536	Modified	-
28-Feb-2013 04:14	BrianH	Comment : 0003537	Modified	-
28-Feb-2013 04:13	BrianH	Comment : 0003538	Added	-
28-Feb-2013 04:12	BrianH	Comment : 0003537	Added	-
28-Feb-2013 04:12	BrianH	Comment : 0003536	Added	-
28-Feb-2013 03:12	adrians	Comment : 0003535	Modified	-
28-Feb-2013 02:59	adrians	Comment : 0003535	Added	-
28-Feb-2013 02:45	adrians	Description	Modified	-
28-Feb-2013 02:31	BrianH	Comment : 0003534	Modified	-
28-Feb-2013 02:30	BrianH	Comment : 0003534	Modified	-
28-Feb-2013 02:16	BrianH	Status	Modified	submitted => reviewed
28-Feb-2013 02:16	BrianH	Category	Modified	Unspecified => Datatype
28-Feb-2013 02:16	BrianH	Description	Modified	-
28-Feb-2013 02:12	BrianH	Comment : 0003534	Added	-
28-Feb-2013 01:00	adrians	Ticket	Added	-