REBOL3 tracker
  0.9.12 beta
Ticket #0000950 User: anonymous

Project:



rss
TypeBug Statustested Date21-Jun-2009 18:13
Versionalpha 58 CategoryNative Submitted bySunanda
PlatformAll Severityminor Prioritynormal

Summary 'difference appears to have arbitrary limitations
Description a: copy [] loop 250'000 [append a random/secure 1000]
b: copy [] loop 250'000 [append b random/secure 1000]
difference a b
** Script error: maximum limit reached: 500000
** Where: difference
** Near: difference a b

R2 happily responds:
R2>> difference a b
R2>> == []
Example code
Further problem -- actually a crash:
    a: copy [] loop 250'000 [append a random/secure 1000]
    b: copy [] loop 250'000 [append b random/secure 1000]
    join a b

Windows crash:
Problem signature:
  Problem Event Name:	APPCRASH
  Application Name:	r3-a57.exe
  Application Version:	0.0.0.0
  Application Timestamp:	4a395fa7
  Fault Module Name:	r3-a57.exe
  Fault Module Version:	0.0.0.0
  Fault Module Timestamp:	4a395fa7
  Exception Code:	c0000005
  Exception Offset:	00002e11
  OS Version:	6.0.6002.2.2.0.768.3
  Locale ID:	2057
  Additional Information 1:	8914
  Additional Information 2:	f56d981f250055a57ac399c8818ff647
  Additional Information 3:	c14f
  Additional Information 4:	6d48cc2273a114c10b9f7d00076a76b0

Assigned ton/a Fixed inalpha 95 Last Update12-Mar-2010 19:42


Comments
(0001007)
Carl
21-Jun-2009 20:36

There's an interesting tradeoff here. In order to avoid order N-squared performance, DIFFERENCE uses hashing - and we place a limit to the size of the hash table. We can make it larger, if so desired, but keep in mind the memory required.
(0001008)
Sunanda
21-Jun-2009 22:55

Thanks for the quick analysis.

I'll escalate the crash to a separate curecode report as it is something separate.

Two comments on the DIFFERENCE issue:

1. 'exclude and 'intersect work on the large set, but 'union and 'difference do not....That does look inconsistent.

2. difference is very slow with large data sets-- try this code for example; it'll work, but may take minutes:
a: mold system
b: copy a
difference a b

On the other hand, this is very fast -- just seconds:
a: mold system
b: copy a
difference unique a unique b

Two thoughts on that:
-- some uses of 'difference are speeded up by an order of magnitude if preceeded by unique. That
observation could be added to the documentation as an optimisation tip;
-- could something similar be attempted internally if the data set exceeds a give size? It may
help with the original bug condition reported.
(0002048)
Sunanda
27-Feb-2010 11:17

The arbitrary 500'000 limit seems to have been removed in later versions.

Both R3-A97 and R2 can now handle the example.

And, on my test machine, both fail with memory shortage when the values are upped even further:

a: copy [] loop 9'250'000 [append a random/secure 1000]
b: copy [] loop 9'250'000 [append b random/secure 1000]
difference a b

R2 says:
** Script Error: Not enough memory
** Near: difference a b

R3 says:
** Internal error: not enough memory
** Where: difference
** Near: difference a b


So mark this one fixed!?

Date User Field Action Change
12-Mar-2010 19:42 BrianH Fixedin Modified => alpha 95
12-Mar-2010 19:42 BrianH Category Modified => Native
12-Mar-2010 19:42 BrianH Status Modified reviewed => tested
27-Feb-2010 11:17 sunanda Comment : 0002048 Added -
21-Jun-2009 22:55 sunanda Comment : 0001008 Added -
21-Jun-2009 20:36 carl Comment : 0001007 Added -
21-Jun-2009 20:24 carl Status Modified submitted => reviewed
21-Jun-2009 18:33 sunanda Description Modified -
21-Jun-2009 18:33 sunanda Code Modified -
21-Jun-2009 18:13 sunanda Ticket Added -