Type | Bug | Status | tested | Date | 21-Jun-2009 18:13 |
---|---|---|---|---|---|
Version | alpha 58 | Category | Native | Submitted by | Sunanda |
Platform | All | Severity | minor | Priority | normal |
Summary | 'difference appears to have arbitrary limitations |
---|---|
Description |
a: copy [] loop 250'000 [append a random/secure 1000] b: copy [] loop 250'000 [append b random/secure 1000] difference a b ** Script error: maximum limit reached: 500000 ** Where: difference ** Near: difference a b R2 happily responds: R2>> difference a b R2>> == [] |
Example code |
Further problem -- actually a crash: a: copy [] loop 250'000 [append a random/secure 1000] b: copy [] loop 250'000 [append b random/secure 1000] join a b Windows crash: Problem signature: Problem Event Name: APPCRASH Application Name: r3-a57.exe Application Version: 0.0.0.0 Application Timestamp: 4a395fa7 Fault Module Name: r3-a57.exe Fault Module Version: 0.0.0.0 Fault Module Timestamp: 4a395fa7 Exception Code: c0000005 Exception Offset: 00002e11 OS Version: 6.0.6002.2.2.0.768.3 Locale ID: 2057 Additional Information 1: 8914 Additional Information 2: f56d981f250055a57ac399c8818ff647 Additional Information 3: c14f Additional Information 4: 6d48cc2273a114c10b9f7d00076a76b0 |
Assigned to | n/a | Fixed in | alpha 95 | Last Update | 12-Mar-2010 19:42 |
---|
Comments | |
---|---|
(0001007)
Carl 21-Jun-2009 20:36 |
There's an interesting tradeoff here. In order to avoid order N-squared performance, DIFFERENCE uses hashing - and we place a limit to the size of the hash table. We can make it larger, if so desired, but keep in mind the memory required. |
(0001008)
Sunanda 21-Jun-2009 22:55 |
Thanks for the quick analysis.
I'll escalate the crash to a separate curecode report as it is something separate. Two comments on the DIFFERENCE issue: 1. 'exclude and 'intersect work on the large set, but 'union and 'difference do not....That does look inconsistent. 2. difference is very slow with large data sets-- try this code for example; it'll work, but may take minutes: a: mold system b: copy a difference a b On the other hand, this is very fast -- just seconds: a: mold system b: copy a difference unique a unique b Two thoughts on that: -- some uses of 'difference are speeded up by an order of magnitude if preceeded by unique. That observation could be added to the documentation as an optimisation tip; -- could something similar be attempted internally if the data set exceeds a give size? It may help with the original bug condition reported. |
(0002048)
Sunanda 27-Feb-2010 11:17 |
The arbitrary 500'000 limit seems to have been removed in later versions.
Both R3-A97 and R2 can now handle the example. And, on my test machine, both fail with memory shortage when the values are upped even further: a: copy [] loop 9'250'000 [append a random/secure 1000] b: copy [] loop 9'250'000 [append b random/secure 1000] difference a b R2 says: ** Script Error: Not enough memory ** Near: difference a b R3 says: ** Internal error: not enough memory ** Where: difference ** Near: difference a b So mark this one fixed!? |
Date | User | Field | Action | Change |
---|---|---|---|---|
12-Mar-2010 19:42 | BrianH | Fixedin | Modified | => alpha 95 |
12-Mar-2010 19:42 | BrianH | Category | Modified | => Native |
12-Mar-2010 19:42 | BrianH | Status | Modified | reviewed => tested |
27-Feb-2010 11:17 | sunanda | Comment : 0002048 | Added | - |
21-Jun-2009 22:55 | sunanda | Comment : 0001008 | Added | - |
21-Jun-2009 20:36 | carl | Comment : 0001007 | Added | - |
21-Jun-2009 20:24 | carl | Status | Modified | submitted => reviewed |
21-Jun-2009 18:33 | sunanda | Description | Modified | - |
21-Jun-2009 18:33 | sunanda | Code | Modified | - |
21-Jun-2009 18:13 | sunanda | Ticket | Added | - |