Clojure

count silently overflows for sequences with more than Integer/MAX_VALUE elements

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Minor Minor
  • Resolution: Unresolved
  • Affects Version/s: Release 1.5
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Patch:
    Code

Description

Found by John Jacobsen: https://mail.google.com/mail/?shva=1#label/clojure/13fbba0c3e4ba6b7

user> (time (count (range (*' 1000 1000 1000 3))))
"Elapsed time: 375225.663 msecs"
-1294967296

Activity

Hide
Alex Miller added a comment -

Perhaps of interest, Java's Collection.size() returns Integer.MAX_VALUE if the size of the collection > MAX_VALUE. I can't say that either that behavior or overflow is particularly helpful in practice of course.

Show
Alex Miller added a comment - Perhaps of interest, Java's Collection.size() returns Integer.MAX_VALUE if the size of the collection > MAX_VALUE. I can't say that either that behavior or overflow is particularly helpful in practice of course.
Hide
John Jacobsen added a comment -

FWIW I did apply the patch, build, and test manually:

user=> (count (range (* 1000 1000 1000 3)))
ArithmeticException integer overflow clojure.lang.RT.countFrom (RT.java:549)
user=>

Show
John Jacobsen added a comment - FWIW I did apply the patch, build, and test manually: user=> (count (range (* 1000 1000 1000 3))) ArithmeticException integer overflow clojure.lang.RT.countFrom (RT.java:549) user=>
Hide
John Jacobsen added a comment -

I agree that Mike's approach is nicer overall, but think Andy's patch is an immediate improvement over what we have now, and could be implemented until someone takes the time to correctly make all the detailed changes Mike is suggesting.

Show
John Jacobsen added a comment - I agree that Mike's approach is nicer overall, but think Andy's patch is an immediate improvement over what we have now, and could be implemented until someone takes the time to correctly make all the detailed changes Mike is suggesting.
Hide
Andy Fingerhut added a comment -

Mike, unless I am missing something, that would require changing the method count() in the Counted interface to return a long, and that in turn requires little changes throughout the code base wherever Counted is used. It could be done, of course, but it is not a small change.

Show
Andy Fingerhut added a comment - Mike, unless I am missing something, that would require changing the method count() in the Counted interface to return a long, and that in turn requires little changes throughout the code base wherever Counted is used. It could be done, of course, but it is not a small change.
Hide
Mike Anderson added a comment -

I think a better strategy might be to convert all count-like functions from int to long. int overflow exceptions aren't a particularly nice result, especially since they probably won't get picked up in tests for the reasons Andy mentioned.

That buys us a quite a few years more future proofing...

Show
Mike Anderson added a comment - I think a better strategy might be to convert all count-like functions from int to long. int overflow exceptions aren't a particularly nice result, especially since they probably won't get picked up in tests for the reasons Andy mentioned. That buys us a quite a few years more future proofing...
Hide
Andy Fingerhut added a comment -

Patch clj-1229-count-overflow-patch-v1.txt dated Jul 8 2013 modifies count to throw an ArithmeticException for sequences with more than Integer/MAX_VALUE elements.

Performance experiments with this expression show no significant time differences before and after the change. I verified with -XX:+PrintCompilation, and only paid attention to timing results after no further JIT compilation was performed when executing the expression:

(defn foo [n m] (doall (for [i (range n)] (count (range m)))))
(time (foo 100 1000000))

No tests are added. A test verifying that an exception is thrown for such a long sequence would be prohibitively slow.

Show
Andy Fingerhut added a comment - Patch clj-1229-count-overflow-patch-v1.txt dated Jul 8 2013 modifies count to throw an ArithmeticException for sequences with more than Integer/MAX_VALUE elements. Performance experiments with this expression show no significant time differences before and after the change. I verified with -XX:+PrintCompilation, and only paid attention to timing results after no further JIT compilation was performed when executing the expression: (defn foo [n m] (doall (for [i (range n)] (count (range m))))) (time (foo 100 1000000)) No tests are added. A test verifying that an exception is thrown for such a long sequence would be prohibitively slow.

People

Vote (1)
Watch (2)

Dates

  • Created:
    Updated: