<< Back to previous view

[CLJ-1229] count silently overflows for sequences with more than Integer/MAX_VALUE elements Created: 09/Jul/13  Updated: 03/Sep/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Andy Fingerhut Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: math

Attachments: Text File clj-1229-count-overflow-patch-v1.txt    
Patch: Code

 Description   

Found by John Jacobsen: https://mail.google.com/mail/?shva=1#label/clojure/13fbba0c3e4ba6b7

user> (time (count (range (*' 1000 1000 1000 3))))
"Elapsed time: 375225.663 msecs"
-1294967296



 Comments   
Comment by Andy Fingerhut [ 09/Jul/13 1:41 AM ]

Patch clj-1229-count-overflow-patch-v1.txt dated Jul 8 2013 modifies count to throw an ArithmeticException for sequences with more than Integer/MAX_VALUE elements.

Performance experiments with this expression show no significant time differences before and after the change. I verified with -XX:+PrintCompilation, and only paid attention to timing results after no further JIT compilation was performed when executing the expression:

(defn foo [n m] (doall (for [i (range n)] (count (range m)))))
(time (foo 100 1000000))

No tests are added. A test verifying that an exception is thrown for such a long sequence would be prohibitively slow.

Comment by Mike Anderson [ 09/Jul/13 3:08 AM ]

I think a better strategy might be to convert all count-like functions from int to long. int overflow exceptions aren't a particularly nice result, especially since they probably won't get picked up in tests for the reasons Andy mentioned.

That buys us a quite a few years more future proofing...

Comment by Andy Fingerhut [ 09/Jul/13 10:15 AM ]

Mike, unless I am missing something, that would require changing the method count() in the Counted interface to return a long, and that in turn requires little changes throughout the code base wherever Counted is used. It could be done, of course, but it is not a small change.

Comment by John Jacobsen [ 09/Jul/13 12:35 PM ]

I agree that Mike's approach is nicer overall, but think Andy's patch is an immediate improvement over what we have now, and could be implemented until someone takes the time to correctly make all the detailed changes Mike is suggesting.

Comment by John Jacobsen [ 09/Jul/13 12:52 PM ]

FWIW I did apply the patch, build, and test manually:

user=> (count (range (* 1000 1000 1000 3)))
ArithmeticException integer overflow clojure.lang.RT.countFrom (RT.java:549)
user=>

Comment by Alex Miller [ 11/Jul/13 3:26 PM ]

Perhaps of interest, Java's Collection.size() returns Integer.MAX_VALUE if the size of the collection > MAX_VALUE. I can't say that either that behavior or overflow is particularly helpful in practice of course.

Generated at Fri Aug 01 21:53:08 CDT 2014 using JIRA 4.4#649-r158309.