REBOL3 tracker
  0.9.12 beta
Ticket #0000328 User: anonymous

Project:



rss
TypeWish Statusproblem Date24-Sep-2007 16:34
Versionalpha 107 CategoryNative Submitted byoldes
PlatformAll Severitytrivial Prioritynormal

Summary DECOMPRESS and zlib data
Description Would it be possible to improve decompress function so it will decompress zlib data? At this moment it's sometimes working, but sometimes it allocates all system memory which is not recycled and sometimes it fails that it has not enough memory at all.

I'm using dll to compress/decompress zlib data, but it requires Rebol/Pro. It looks, that as is Rebol able to use it's current compress to create correct zlib data, it should be also able to use decompress on such a data without problems.

I can provide zlib data, which cause a problem with Rebol's decompress if needed.
>> length? read/binary %/F/test/decompress-data.bin
== 623798
>> data: decompress read/binary %/F/test/decompress-data.bin
** Script Error: Not enough memory
** Near: data: decompress read/binary %/F/test/decompress-data.bin
>> data: decompress read/binary %/F/test/decompress-data.bin
== {€^@^C^M@^@^AíÀ^@^Y^A^@D^Q^@^@^@^@C...
>> length? data
== 712421
>> stats
== 1330879868
Example code

			

Assigned ton/a Fixed in- Last Update26-Jan-2011 10:33

Attached Files

Comments
(0000178)
admin
12-Jan-2008 11:12

I think you just need to append the length of the original data to the end of the compressed data for DECOMPRESS to work correctly. It needs to know how much memory to allocate (so you can also put an estimate, as long as it won't be overflown). -Gabriele
(0001996)
BrianH
7-Feb-2010 21:18

Please provide some zlib-compressed data to test with. Otherwise we can't mark this ticket as tested.
(0002566)
abolka
1-Oct-2010 16:44

zlib-deflate "foo" == #{789C4BCBCF070002820145}

Which seems to indeed differ from the COMPRESS output only by 4 trailing length bytes:

>> compress "foo"
== #{789C4BCBCF07000282014503000000}

Decompressing without the trailing length works with A107 for me, but is ridiculously slow:

>> print stats print dt [decompress #{789C4BCBCF070002820145}] print stats
872352
0:00:01.07669
1158594384
(0002600)
oldes
19-Oct-2010 15:57

It takes too much time to decompress and it fails on second run:

>> decompress #{789C4BCBCF070002820145}
== #{666F6F}

>> decompress #{789C4BCBCF070002820145}
** Internal error: not enough memory
** Where: decompress
** Near: decompress #{789C4BCBCF070002820145}
(0002601)
oldes
19-Oct-2010 16:02

If nothing else the script should not end up using so much memory.
(0002626)
BrianH
20-Oct-2010 04:47

>> compress #{666F6F}
== #{789C4BCBCF07000282014503000000}

Note the difference from the value above. Those last 4 bytes are the length that is allocated to store the result, as a little-endian integer. All REBOL compressed values have this. Otherwise, the data is *exactly* the same as zlib data, including the magic number at the beginning. There is no way for DECOMPRESS to autodetect the difference between zlib data and REBOL compressed data. This means that DECOMPRESS checks the last 4 bytes to determine the amount to allocate, even for zlib data that isn't supposed to have those 4 bytes.

>> to-integer reverse #{02820145}
== 1157726722

That is the last 4 bytes of the zlib data, converted to an integer the same way DECOMPRESS does. That is the amount of data that DECOMPRESS allocated for the result. This is why you run out of memory.

We can add a /zlib refinement to DECOMPRESS, similar to the /gzip refinement added in alpha 108. But what we can't do is autodetect zlib data.
(0002628)
abolka
20-Oct-2010 04:59

We _can_ autodetect zlib/deflate/RFC1951 data, the only question is whether we want to.

A zlib stream consists of a series of blocks, each block with a defined length, and the final block marked explicitly. So to detect whether a given stream is zlib encoded, we'd need a quick pre-pass thru the compressed binary, decoding only the block lengths and skipping ahead to the final block. If there's 4 length bytes coming after that: REBOL-style COMPRESS data, if not: zlib data.

The catch is: the above process duplicates most of what the decompressor needs to do anyway, except for memory allocation and copying. So roughly speaking, performance would be cut in half.

I see three options:

- Take the performance hit, but still use the length bytes to at least keep the nice memory pre-allocation.

- Always ignore the length bytes, and make decompress a plain zlib inflate. This will not suffer the double-decompress penalty, but possibly incur a performance penalty from memory management overhead (as the buffer for the decompressed result may need to be resized dynamically).

- Keep things as they are, and add a /zlib refinement to COMPRESS and DECOMPRESS instead (which is suggested in #1667).
(0002630)
BrianH
20-Oct-2010 05:45

Of those three choices, it seems that adding the /zlib refinement would be the best bet. DECOMPRESS without /zlib would use the current method, and with /zlib do the plain zlib deflate and take the performance hit.
(0002631)
abolka
20-Oct-2010 05:49

Adding a separate /zlib refinement is my preference as well.
(0002634)
Carl
20-Oct-2010 07:06

When we added zlib more than ten years ago, it took the programmer a long time, because it's one very nasty piece of code. We made a few optimizations to get the performance we desired.

However, we made a mistake: we left it raw. We should have wrapped it to make it possible to check fields like the length. In other words, there's no validation prior to usage of the values. Yes, there's the "magic" at the start, but in fact, that's state data (dual nibbles) *not* magic data for pattern comparison.
(0002644)
oldes
21-Oct-2010 10:26

I've uploaded R2 script with zlib.dll which I use to do zlib compression/decompression under R2.

With this script I have these results:
>> zlib/compress "foo"
== #{78DA4BCBCF070002820145}
>> as-string zlib/decompress zlib/compress "foo"
== "foo"

That's exactly what you will get with Python's zlib module with default compression level (6):
>>> import zlib
>>> zlib.compress("foo")
'x\x9cK\xcb\xcf\x07\x00\x02\x82\x01E'

Also you can see, that REBOL's compress function produces ZLIB data, just adds the length:
>> zlib/compress/level "foo" 6
== #{789C4BCBCF070002820145}
>> compress "foo"
== #{789C4BCBCF07000282014503000000}

Btw.. zlib sources with even newer precompiled dll are available here: http://zlib.net/ (but you probably know it)
(0002645)
oldes
21-Oct-2010 10:42

One more note.. the best would be to have a de/compression port so one could do streamed de/compression.
(0003050)
oldes
26-Jan-2011 10:33

Simple ZLIB extension, win version only at this moment: https://github.com/Oldes/R3A110/tree/master/extensions/zlib

Date User Field Action Change
3-Oct-2015 03:02 abolka Comment : 0002628 Modified -
26-Jan-2011 10:33 oldes Comment : 0003050 Added -
21-Oct-2010 10:45 oldes Comment : 0002644 Modified -
21-Oct-2010 10:42 oldes Comment : 0002645 Added -
21-Oct-2010 10:26 oldes Comment : 0002644 Added -
20-Oct-2010 07:06 carl Comment : 0002634 Added -
20-Oct-2010 05:56 abolka Comment : 0002631 Modified -
20-Oct-2010 05:49 abolka Comment : 0002631 Added -
20-Oct-2010 05:45 BrianH Comment : 0002630 Added -
20-Oct-2010 05:39 abolka Comment : 0002628 Modified -
20-Oct-2010 05:13 abolka Comment : 0002628 Modified -
20-Oct-2010 05:12 abolka Comment : 0002628 Modified -
20-Oct-2010 05:12 abolka Comment : 0002628 Modified -
20-Oct-2010 04:59 abolka Comment : 0002628 Added -
20-Oct-2010 04:49 BrianH Status Modified reviewed => problem
20-Oct-2010 04:47 BrianH Comment : 0002626 Added -
20-Oct-2010 04:37 BrianH Type Modified Bug => Wish
20-Oct-2010 04:37 BrianH Priority Modified none => normal
20-Oct-2010 04:37 BrianH Status Modified submitted => reviewed
19-Oct-2010 16:02 oldes Comment : 0002601 Added -
19-Oct-2010 15:58 oldes Type Modified Wish => Bug
19-Oct-2010 15:58 oldes Status Modified built => submitted
19-Oct-2010 15:58 oldes Version Modified alpha 97 => alpha 107
19-Oct-2010 15:57 oldes Comment : 0002600 Added -
1-Oct-2010 16:47 abolka Comment : 0002566 Modified -
1-Oct-2010 16:44 abolka Comment : 0002566 Added -
7-Feb-2010 21:18 BrianH Comment : 0001996 Added -
7-Feb-2010 21:17 BrianH Category Modified => Native
7-Feb-2010 21:17 BrianH Version Modified => alpha 97
2-Dec-2008 18:50 Admin Ticket Added -