<< Back to previous view

[MCOMB-2] OutOfMemory Error with combinatorics/subsets Created: 29/Apr/13  Updated: 08/Apr/14  Resolved: 08/Apr/14

Status: Resolved
Project: math.combinatorics
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Mark Engel Assignee: Mark Engelberg
Resolution: Completed Votes: 1
Labels: None

Mac OS X 10.8.3
Clojure 1.5.1 (installed with homebrew)
Leiningen 2.1.3 on Java 1.6.0_45 Java HotSpot(TM) 64-Bit Server VM
math.combinatorics 0.0.4

% java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06-451-11M4406)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01-451, mixed mode)

Attachments: Text File mcomb-2-allow-subsets-to-use-less-memory-v1.txt    


Hello guys,

I have an issue with an OutOfMemory error with bigger sets and the subsets command.

I have bigger sets of 1000+ elements and want to lazily create all possible subsets of this set in order to filter out some that are interesting to me.

I was very happy to run into your library as it handles the heavy lifting for me and I only have to do the filtering. Nice!

But if I run this sample code

lein repl
=> (last (clojure.math.combinatorics/subsets (range 1000)))
OutOfMemoryError Java heap space  clojure.core/map (core.clj:2469)

it returns with an OutOfMemory Error to me.
I thought the memory usage of subsets would be constant as the function should return a lazy list.

It would be really nice if this library could calculate subsets of bigger lists without running into memory problems.

I posted this question on StackOverflow with more information:

And people replied that they don't have this issue. Is this an issue with my platform or is this an issue with the memory usage of subsets?
It would be great if you could show me a way to get around this issue.

If I can be of any help, just let me know.

Cheers, Mark

Comment by Stefan du Fresne [ 29/Apr/13 5:29 AM ]

I get this as well, 2009 MBP, with any (range x) higher than 18.

Comment by Mark Engel [ 29/Apr/13 8:32 AM ]

Great to have the problem reproduced at another computer.

I just found an article describing an OutOfMemory Error with mapcat, which is used in the subsets function

Comment by Andy Fingerhut [ 02/May/13 2:30 PM ]

Patch mcomb-2-allow-subsets-to-use-less-memory-v1.txt dated May 2 2013 seems to fix this problem.

range returns a chunked sequence, and when map processes a chunked sequence it preserves the chunks. Chunks are little Java arrays of Object references, pointing at the results, and none of them will be GCed until the entire chunk is finished being processed.

It might be that a few other calls to unchunk wrapped around range calls might be useful in the math.combinatorics library, but certainly not all of them.

Comment by David James [ 23/Jun/13 6:04 PM ]

Thanks Andy, that worked for me!

Andy's patch, applied to my fork:

Comment by Andy Fingerhut [ 08/Apr/14 8:24 AM ]

Mark, any thoughts on this patch? The reason for the memory exhaustion without the patch is a bit subtle, and I can try explaining it differently if you are interested.

Comment by Andy Fingerhut [ 08/Apr/14 4:21 PM ]

Mark Engelberg committed this fix: https://github.com/clojure/math.combinatorics/commit/4b5312218344264c3227f7f814db6c688d0ab2fc

I will close this ticket.

[MCOMB-1] math.combinatorics README should be updated to conform to contrib standard Created: 17/Sep/12  Updated: 17/Sep/12  Resolved: 17/Sep/12

Status: Resolved
Project: math.combinatorics
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Trivial
Reporter: Christian Romney Assignee: Mark Engelberg
Resolution: Completed Votes: 0
Labels: documentation, patch

Attachments: Text File 0001-Updated-README-to-conform-to-new-contrib-standard.patch    
Patch: Code


As per Sean Corfield's suggestion here: https://groups.google.com/forum/?fromgroups=#!searchin/clojure-dev/math/clojure-dev/p5oz42gR_sk/cesMHO9cDWEJ
the math.combinatorics README.md should be updated to be more useful, especially to newbies.

Comment by Sean Corfield [ 17/Sep/12 11:35 AM ]

Patch applied.

[MCOMB-4] Performance enhancement for sorted-numbers? Created: 12/Jul/14  Updated: 19/Jul/14  Resolved: 19/Jul/14

Status: Resolved
Project: math.combinatorics
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Trivial
Reporter: Daniel Marjenburgh Assignee: Mark Engelberg
Resolution: Completed Votes: 0
Labels: enhancement, patch, performance

Attachments: File sorted-numbers-perf-enh.diff    
Patch: Code
Approval: Accepted



Came upon this as I was trying to improve performance. The implementation of sorted-numbers? (only used by permutations) is:
(defn- sorted-numbers?
"Returns true iff s is a sequence of numbers in non-decreasing order"
(and (every? number? s)
(every? (partial apply <=) (partition 2 1 s))))

<= and similar operators are variadic, so partitioning into pairs is unneeded.
(apply <= s) is a lot faster, but breaks for empty sequences, so an additional check is needed.

The implementation can be changed to:

(and (every? number? s)
(or (empty? s) (apply <= s)))

I benched this to be 10 to 15 times faster. A regression test with test.check was done to verify that the behaviour was not changed under any input.
A patch is also included.



Comment by Andy Fingerhut [ 13/Jul/14 7:39 PM ]

Thanks for the submission. I don't know whether Mark would be interested in taking the test.check regression test you developed, too, but if you would be willing to share it as a patch, it might be considered for inclusion as well.

Comment by Daniel Marjenburgh [ 18/Jul/14 5:43 AM ]

Hm, I don't know how I could include the regression test as a patch. Because:
1) The current project does not use external libraries or Leiningen (yet). How do I add dependencies when not using Leiningen?
2) I tested the old implementation vs the new one with generated inputs. If the old implementation is gone, there's no test to add...

Comment by Mark Engelberg [ 19/Jul/14 5:06 PM ]

Since this function is only called once at the beginning of the permutations process, on a sequence that is usually 10 or fewer items, I can't imagine this improvement will have any meaningful impact on the overall running time of the permutations algorithm. Nevertheless, it's an improvement, so I've gone ahead and added it.

Generated at Thu Sep 18 04:54:56 CDT 2014 using JIRA 4.4#649-r158309.