REBOL3 tracker
  0.9.12 beta
Ticket #0002064 User: anonymous

Project:



rss
TypeWish Statusdismissed Date20-Sep-2013 18:15
Versionr3 master CategoryUnspecified Submitted byfork
PlatformAll Severityminor Prioritynormal

Summary Futureproof routines for binary/integer conversion for arbitrary integer bit widths
Description In Rebol2:

>> to binary! 1
== #{31}

>> to binary! -1
== #{2D31}

>> to integer! #{2D31}
== 11569

>> to integer! #{FFFFFFFF}
== -1

In Rebol3:

>> to binary! 1
== #{0000000000000001}

>> to binary! -1
== #{FFFFFFFFFFFFFFFF}

>> to integer! #{FFFFFFFF}
== 4294967295

>> to binary! 4294967296
== #{0000000100000000}

>> to integer! #{FFFFFFFFFFFFFFFF}
== -1

For TO INTEGER! conversions, Rebol2 is converting the integer to a string, then converting the string to a binary, and that's pretty wacky... and thankfully fixed. I think it's better for the TO BINARY! conversion of an integer to produce the integer as bits, as Rebol3 does.

However, underlying this we see the issue of an assumed bit width. I actually thought that Rebol would do the programmer a service by having some kind of finessed arbitrary precision integer type, which could cleverly fit the integer into a register sized piece of memory for smaller numbers... and then grow to a dynamic representation once the number got bigger. Rebol frees programmers from worrying about the sizes of "string buffers", shouldn't it free them from the size of "integer buffers"? But as @BrianH would say: "make a new ticket".

Still...it was thinking about the idea of how Rebol would do TO BINARY! on an integer of unknown size...and dealing with the massive amount of pain that this 32-bit to 64-bit change had in the Red port...that led me to think that the primitives Rebol provides should be more "futureproof" so that the style of code users write for these conversions would continue to work regardless of increases in the underlying bit size; even to arbitrary bignum sizes. What I came up with was this:

"a TO conversion of a BINARY! to an INTEGER! will examine the first byte of the series. If it is #{80} or greater, then it will assume the number is in two's complement representation. If less then it will simply interpret it as bytes representing the integer."

>> to integer! #{01}
== 1

>> to integer! #{FF}
== -1

>> to integer! #{FFFFFFFF}
== -1

>> to integer! #{FFFFFFFFFFFFFFFF}
== -1

>> to integer! #{0100000000}
== 4294967296

(Notice that with this approach, if the user wants to *guarantee* an unsigned number in a TO conversion, they can simply prepend a #{00} to their input.)

On the other side:

"a TO conversion of a non-negative INTEGER! to a BINARY! will produce a minimal byte string for representing the integer such that the first byte is less than #{80}, thus inserting a single leading #{00} if necessary. If the number is negative, the the integer's minimal representation will be presented as bytes in two's complement...such that the first byte is greater than or equal to #{80}, thus inserting a single leading #{FF} if necessary."

This approach is deterministic and future-proof. Handling the results requires a bit more intent and purposefulness on the caller's part, but it puts them in control. When working with a known target bit width there may be too few or too many bytes, but the overflows should be easy to work with as any cases that don't represent an error should always be #{00} or #{FF}. Underflows could be helped with an optimized PAD native which let you fill in leading bytes to get the series to a certain width would be nice. (Could be useful in general. Again, "separate ticket", now CC #2065)

Note that this retains reversibility for integers. If N is an integer, then:

N = TO INTEGER! TO BINARY! N

Due to the tolerance of arbitrary counts of leading #{FF} or #{00} bytes, binaries will not necessarily get back the exact same bytes in round-trip convesion...although the result will match the tail of the input. It will also stabilize after one reconversion (assuming the first conversion succeeded). So if B is a binary!, and if Y = TO INTEGER! B succeeds, then Z = TO BINARY! Y will also succeed and from then on:

Z = TO BINARY! TO INTEGER! Z

With arbitrary precision arithmetic, this will work for *any* binary B.
Example code

			

Assigned ton/a Fixed in- Last Update21-Sep-2013 23:07


Comments
(0003997)
BrianH
20-Sep-2013 18:45

This will need a native companion function that can pad a binary to a specified length with #{00} or #{FF} bytes, depending on which is appropriate. That's the bare minimum to avoid the loss of functionality, even though existing code will need to change. And we will still need the more general binary conversion functions we planned to add, since the TO conversions were never intended to be general enough for all binary conversions. You need either options or a dialect to be able to handle more stuff.
(0003998)
fork
20-Sep-2013 19:13

Yes... other constructions (be they from MAKE or new functions) will be able to offer more tailored approaches. But I feel that excising the bit-width from the parameterless TO, and giving a PAD primitive, is at least a predictable toolbox that people will be able to count on.

Given that Rebol3 pretty much broke every one of the integer/binary conversions from Rebol2 anyway, might as well transition to something that won't just create the same problem again next time!
(0003999)
BrianH
20-Sep-2013 21:51

A general PAD function won't do for this case, though it would be nice to have. The companion function we would need would only take the binary and the length as a parameter, and figure out for itself whether it needs to pad with #{00} or #{FF} based on testing the binary value itself. It needs to be simple enough to be able to be added to nearly every expression that converts an integer to a binary, since it would be needed every time we did that conversion. Something like this, with a smaller name, and native:

signed-pad: function [value [binary!] length [integer!]] [
    insert/dup value either value/1 < 128 [#{00}] [#{FF}] length - length? value
    value
]

The reason that this function would need to be so simple, so specialized, with a small name, and native, is that it would need to be used every time TO binary! integer! is used. The main advantage to this proposal is that the result of TO binary! integer! would be the same regardless of what integer size a particular version of Rebol is using. The disadvantage would be that this value would be useless almost all of the time, and would usually have to be padded to a specific width with either #{00} or #{FF} before the value could be used.

Now, for results other than the current integer width, we have to do something similar today in reverse with TO's current behavior, so this isn't really an argument against this proposal. However, it does mean that this operation will need to be fast if we want TO binary! integer! to be useful. And if we want it to be used, it has to be simple and not too intrusive into the expressions where it will be used. So: simple, small name, and native.
(0004001)
Ladislav
20-Sep-2013 23:37

Aha, so you want this code to cease to work?

add-ulp: func [value [decimal!] ulp [integer!]] [to decimal! to binary! add to integer! to binary! value ulp]

In particular, for example,

to integer! to binary! to decimal! to binary! my-integer

always yields my-integer at present.
(0004002)
Ladislav
20-Sep-2013 23:39

For example, the result of

to binary! $1

does not look different from

to binary! 1

except for the bit width.

Also

to integer! #{ff}

is equal to

first #{ff}

at present, which looks less unexpected than a different result. We can never satisfy everyone, and in this case it is this proposal I do not intend to satisfy.
(0004004)
fork
21-Sep-2013 04:29

@Ladislav, I still do not understand your example or its motivation.

Instead of being against something, could you describe exactly what it is you are for? This proposal is not made for whimsical reasons. It is the outcome of dealing with a painful transition between the 32-bit integer convention of Rebol 2, to the 64-bit convention of Rebol 3. I believe that if we must rewrite code, the new code shouldn't embed the exact same category of problems after the rewrite.

If there is to be BIGNUM! support in Rebol, then some sort of way of producing binaries from them will presumably be desired... and the idea of making it possible to seamlessly switch from INTEGER! to BIGNUM! is appealing; and transparently switching from INTEGER! implementation to BIGNUM! implementation under the hood in the case of overflow would put Rebol in league with its modern peers who do so.

If you say "I believe Rebol should always make 64-bit encoded binaries from integers, because..." that is a position. "I don't like it because if you convert an integer to a binary and then to a decimal you might get a different result" doesn't make for a very good argument.
(0004005)
Ladislav
21-Sep-2013 06:41

I find the current behaviour useful. This proposal does not significantly improve transition to a bigger integer type anyway. Actually, in my opinion it does not improve transition to a bigger integer type at all.
(0004006)
abolka
21-Sep-2013 16:09

"I believe that if we must rewrite code, the new code shouldn't embed the exact same category of problems after the rewrite."

Agreed, and also agreed with the _title_ of this proposal: "Futureproof routines for binary/integer conversion for arbitrary integer bit widths". But bolting this onto TO BINARY! is the tail wagging the dog. The TO BINARY! conversions for integers are fine as is. They show implementation detail and therefore should be used carefully.

However, Rebol's notoriously weak binary conversion functions are the root cause of this issue. Provide better support for binary conversion _with another set of functions_ and leave TO BINARY! consistently showing implementation details for numeric types. Rebol 2's problem is the lack of strong binary conversion functionality, which lead to all sort of ugly workarounds, involving TO BINARY! and STRUCT!s, and many other dirty hacks. It is this root cause we should tackle, not the ephemeral effect of some corners where implementation details are necessarily accessible.

We need a set of "future-proof routines for binary/integer conversion"; TO BINARY! is not the place to put them.
(0004007)
fork
21-Sep-2013 23:07

I maintain that it is useful to have a variable-sized, little-endian, two's-complement representation of an integer--which can be padded as desired by the user--and that using such a standard for TO instead of a fixed bit size is forward-looking idea.

@earl has pointed out that this only excises the bit width dependency, and still relies upon the representation of negativeness and endianness. He also claims that the operation produces a result that is not useful without some sort of padding.

I argue that being able to easily pad to what you like is more useful, as PAD/SIGN 8 TO BINARY! VALUE or PAD/SIGN 4 TO BINARY! VALUE lets you choose what you want. Also... if Rebolers thought more in arbitrary sizes, it can be useful in minimal serialization of an arbitrary length integer... because you also have the binary's length to combine with the data.

I'm also okay that endianness and negative encoding is fixed, just as I'm okay that TO BINARY! of a STRING! produces UTF-8 (despite the existence of other encodings). Of the three aspects, it's only the bit width that limits the quantity of information represented and can cause loss or errors. The other two aspects can be processed in a later conversion if absolutely needed.

Regardless of me thinking mine is the better proposal, it seems others are going to rabidly defend TO BINARY! of an integer being strictly defined as giving back a 64-bit two's complement little endian byte array. Honestly, I'm frustrated. But here's a ticket showing posterity that I said that was lame, and had another idea. Dismissing the ticket and moving on.

This future-proofing is going to have to come from somewhere else...for functionality we agree is needed (e.g. in the Red port). Yet I don't have any better ideas up my sleeve--because this is the best one I came up with.

Date User Field Action Change
21-Sep-2013 23:09 fork Comment : 0004007 Modified -
21-Sep-2013 23:08 fork Comment : 0004007 Modified -
21-Sep-2013 23:07 fork Status Modified submitted => dismissed
21-Sep-2013 23:07 fork Comment : 0004007 Added -
21-Sep-2013 16:09 abolka Comment : 0004006 Added -
21-Sep-2013 06:59 Ladislav Comment : 0004005 Modified -
21-Sep-2013 06:50 Ladislav Comment : 0004001 Modified -
21-Sep-2013 06:48 Ladislav Comment : 0004002 Modified -
21-Sep-2013 06:47 Ladislav Comment : 0004003 Removed -
21-Sep-2013 06:47 Ladislav Comment : 0004002 Modified -
21-Sep-2013 06:46 Ladislav Comment : 0004002 Modified -
21-Sep-2013 06:42 Ladislav Comment : 0004005 Modified -
21-Sep-2013 06:41 Ladislav Comment : 0004005 Added -
21-Sep-2013 05:57 fork Comment : 0004004 Modified -
21-Sep-2013 05:55 fork Comment : 0004004 Modified -
21-Sep-2013 04:29 fork Comment : 0004004 Added -
21-Sep-2013 01:10 Ladislav Comment : 0004003 Modified -
20-Sep-2013 23:53 Ladislav Comment : 0004003 Added -
20-Sep-2013 23:40 Ladislav Comment : 0004002 Modified -
20-Sep-2013 23:39 Ladislav Comment : 0004002 Added -
20-Sep-2013 23:37 Ladislav Comment : 0004001 Modified -
20-Sep-2013 23:37 Ladislav Comment : 0004001 Added -
20-Sep-2013 21:51 BrianH Comment : 0003999 Added -
20-Sep-2013 20:49 fork Description Modified -
20-Sep-2013 19:13 fork Comment : 0003998 Added -
20-Sep-2013 18:46 BrianH Comment : 0003997 Modified -
20-Sep-2013 18:45 BrianH Comment : 0003997 Added -
20-Sep-2013 18:29 fork Description Modified -
20-Sep-2013 18:29 fork Description Modified -
20-Sep-2013 18:17 fork Description Modified -
20-Sep-2013 18:15 fork Ticket Added -