Clojure

empty? is broken for transient collections

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Critical Critical
  • Resolution: Unresolved
  • Affects Version/s: Release 1.7
  • Fix Version/s: Release 1.11
  • Component/s: None
  • Labels:
  • Patch:
    Code and Test
  • Approval:
    Vetted

Description

Couldn't find whether it was brought up earlier, but it seems that empty? predicate is broken for transient collections

user=> (empty? (transient []))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentVector$TransientVector  clojure.lang.RT.seqFrom (RT.java:528)

user=> (empty? (transient {}))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentArrayMap$TransientArrayMap  clojure.lang.RT.seqFrom (RT.java:528)

user=> (empty? (transient #{}))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentHashSet$TransientHashSet  clojure.lang.RT.seqFrom (RT.java:528)

The workaround is to use (zero? (count (transient ...))) check instead.

Cause: empty? is based on seqability, which transients don't implement.

Proposed Add a branch to empty? for counted? colls. Transients implement Counted so gain support via this branch. Other colls that are counted are faster. Seq branch continues to work for seqs.

Perf test:

(def p [])
(def p1 [1])
(def t (transient []))
(def t1 (transient [1]))

;; take last time of all these
(dotimes [i 20] (time (dotimes [_ 10000] (empty? p))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? p1))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? t))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? t1))))

Results:

coll before after result
p 0.72 ms 0.08 ms much faster when empty
p1 0.15 ms 0.13 ms slightly faster when not empty
t error 0.19 ms no longer errors
t1 error 0.20 ms no longer errors

Not sure if doc string should be tweaked to be more generic, particularly the "same as (not (seq coll))" which is now only true for Seqable but not Counted. I think the advise to use (seq coll) for seq checks is still good there.

I did a skim for other types that are Counted but not seqs/Seqable and didn't find much other than internal things like ChunkBuffer. Many are both and would thus use the counted path instead (all the persistent colls for example and any kind of IndexedSeq).

I guess another option would be just to fully switch empty? to be about (zero? (bounded-count 1 coll)) and lean on count's polymorphism completely.

Patch: clj-1872.patch

Activity

Hide
Alex Miller added a comment -

Probably similar to CLJ-700.

Show
Alex Miller added a comment - Probably similar to CLJ-700.
Alex Miller made changes -
Field Original Value New Value
Approval Triaged [ 10120 ]
Alex Miller made changes -
Priority Major [ 3 ] Critical [ 2 ]
Alex Miller made changes -
Labels collections
Rich Hickey made changes -
Approval Triaged [ 10120 ] Vetted [ 10003 ]
Alex Miller made changes -
Fix Version/s Release 1.9 [ 10750 ]
Alex Miller made changes -
Fix Version/s Release 1.9 [ 10750 ]
Fix Version/s Release Next [ 11451 ]
Alex Miller made changes -
Fix Version/s Release 1.10 [ 11451 ]
Fix Version/s Release 1.11 [ 11750 ]
Hide
Devin Walters added a comment -

As mentioned in CLJ-700, this is a different issue.

Show
Devin Walters added a comment - As mentioned in CLJ-700, this is a different issue.
Hide
Devin Walters added a comment -

First things first, the original description brings up `(empty? (transient ()))`. Per the documentation at https://clojure.org/reference/transients, there is no benefit to be had for supporting transients on lists.

Current behavior for java collections:

(empty? (java.util.HashMap. {}))
=> true

(empty? (java.util.HashMap. {1 2}))
=> false

(seq (java.util.HashMap. {1 2}))
=> (#object[java.util.HashMap$Node 0x4335c9c3 "1=2"])

(seq (java.util.HashMap. {}))
=> nil

The same behavior is true of java arrays.

Over in CLJS-2802, the current patch's approach is to `cond` around the problem in `empty?` by explicitly checking whether it's a TransientCollection, and if so, using `(zero? (count coll))` as the original description mentions as a workaround.

Currently, transient collections do not implement Iterable as the persistent ones do. If Iterable were implemented, I believe RT.seqFrom would work, and by extension, `empty?`.

Show
Devin Walters added a comment - First things first, the original description brings up `(empty? (transient ()))`. Per the documentation at https://clojure.org/reference/transients, there is no benefit to be had for supporting transients on lists. Current behavior for java collections:
(empty? (java.util.HashMap. {}))
=> true

(empty? (java.util.HashMap. {1 2}))
=> false

(seq (java.util.HashMap. {1 2}))
=> (#object[java.util.HashMap$Node 0x4335c9c3 "1=2"])

(seq (java.util.HashMap. {}))
=> nil
The same behavior is true of java arrays. Over in CLJS-2802, the current patch's approach is to `cond` around the problem in `empty?` by explicitly checking whether it's a TransientCollection, and if so, using `(zero? (count coll))` as the original description mentions as a workaround. Currently, transient collections do not implement Iterable as the persistent ones do. If Iterable were implemented, I believe RT.seqFrom would work, and by extension, `empty?`.
Hide
Alex Miller added a comment -

I think there are good reasons for transient collections not to be Seqable - seqs imply caching, caching hurts perf, and the whole reason to be using transients is for batch load perf. So that seems counter-productive. Iterators are stateful and again, I suspect that is probably a bad thing to add just for the sake of checking empty?.

An explicit check for emptiness of counted? colls would cover all the transient colls and anything else counted without making a seq. That might be faster for all those cases, and doesn't require anything new of anybody in the impl.

Another option would be to have an IEmptyable interface and/or protocol to indicate explicit empty? check support. Probably overkill.

Show
Alex Miller added a comment - I think there are good reasons for transient collections not to be Seqable - seqs imply caching, caching hurts perf, and the whole reason to be using transients is for batch load perf. So that seems counter-productive. Iterators are stateful and again, I suspect that is probably a bad thing to add just for the sake of checking empty?. An explicit check for emptiness of counted? colls would cover all the transient colls and anything else counted without making a seq. That might be faster for all those cases, and doesn't require anything new of anybody in the impl. Another option would be to have an IEmptyable interface and/or protocol to indicate explicit empty? check support. Probably overkill.
Alex Miller made changes -
Description Couldn't find whether it was brought up earlier, but it seems that {{empty?}} predicate is broken for transient collections

{code}
user=> (empty? (transient []))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentVector$TransientVector clojure.lang.RT.seqFrom (RT.java:528)

user=> (empty? (transient ()))
ClassCastException clojure.lang.PersistentList$EmptyList cannot be cast to clojure.lang.IEditableCollection clojure.core/transient (core.clj:3209)

user=> (empty? (transient {}))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentArrayMap$TransientArrayMap clojure.lang.RT.seqFrom (RT.java:528)

user=> (empty? (transient #{}))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentHashSet$TransientHashSet clojure.lang.RT.seqFrom (RT.java:528)
{code}

The workaround is to use {{(zero? (count (transient ...)))}} check instead.
Couldn't find whether it was brought up earlier, but it seems that {{empty?}} predicate is broken for transient collections

{code}
user=> (empty? (transient []))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentVector$TransientVector clojure.lang.RT.seqFrom (RT.java:528)

user=> (empty? (transient {}))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentArrayMap$TransientArrayMap clojure.lang.RT.seqFrom (RT.java:528)

user=> (empty? (transient #{}))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentHashSet$TransientHashSet clojure.lang.RT.seqFrom (RT.java:528)
{code}

The workaround is to use {{(zero? (count (transient ...)))}} check instead.

*Cause:* {{empty?}} is based on seqability, which transients don't implement.

*Proposed* Add a branch to {{empty?}} for counted? colls. Transients implement Counted so gain support via this branch. Other colls that are counted are faster. Seq branch continues to work for seqs.

Perf test:

{code}
(def p [])
(def p1 [1])
(def t (transient []))
(def t1 (transient [1]))

;; take last time of all these
(dotimes [i 20] (time (dotimes [_ 10000] (empty? p))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? p1))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? t))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? t1))))
{code}

Results:

||coll||before||after||result||
|p|0.72 ms|0.08 ms|much faster when empty|
|p1|0.15 ms|0.13 ms|slightly faster when not empty|
|t|error|0.19 ms|no longer errors|
|t1|error|0.20 ms|no longer errors|

Not sure if doc string should be tweaked to be more generic, particularly the "same as (not (seq coll))" which is now only true for Seqable but not Counted. I think the advise to use (seq coll) for seq checks is still good there.

I did a skim for other types that are Counted but not seqs/Seqable and didn't find much other than internal things like ChunkBuffer. Many are both and would thus use the counted path instead (all the persistent colls for example and any kind of IndexedSeq).

I guess another option would be just to fully switch empty? to be about (count = 0) and lean on count's polymorphism completely.

*Patch:* clj-1872.patch
Attachment clj-1872.patch [ 19007 ]
Alex Miller made changes -
Description Couldn't find whether it was brought up earlier, but it seems that {{empty?}} predicate is broken for transient collections

{code}
user=> (empty? (transient []))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentVector$TransientVector clojure.lang.RT.seqFrom (RT.java:528)

user=> (empty? (transient {}))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentArrayMap$TransientArrayMap clojure.lang.RT.seqFrom (RT.java:528)

user=> (empty? (transient #{}))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentHashSet$TransientHashSet clojure.lang.RT.seqFrom (RT.java:528)
{code}

The workaround is to use {{(zero? (count (transient ...)))}} check instead.

*Cause:* {{empty?}} is based on seqability, which transients don't implement.

*Proposed* Add a branch to {{empty?}} for counted? colls. Transients implement Counted so gain support via this branch. Other colls that are counted are faster. Seq branch continues to work for seqs.

Perf test:

{code}
(def p [])
(def p1 [1])
(def t (transient []))
(def t1 (transient [1]))

;; take last time of all these
(dotimes [i 20] (time (dotimes [_ 10000] (empty? p))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? p1))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? t))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? t1))))
{code}

Results:

||coll||before||after||result||
|p|0.72 ms|0.08 ms|much faster when empty|
|p1|0.15 ms|0.13 ms|slightly faster when not empty|
|t|error|0.19 ms|no longer errors|
|t1|error|0.20 ms|no longer errors|

Not sure if doc string should be tweaked to be more generic, particularly the "same as (not (seq coll))" which is now only true for Seqable but not Counted. I think the advise to use (seq coll) for seq checks is still good there.

I did a skim for other types that are Counted but not seqs/Seqable and didn't find much other than internal things like ChunkBuffer. Many are both and would thus use the counted path instead (all the persistent colls for example and any kind of IndexedSeq).

I guess another option would be just to fully switch empty? to be about (count = 0) and lean on count's polymorphism completely.

*Patch:* clj-1872.patch
Couldn't find whether it was brought up earlier, but it seems that {{empty?}} predicate is broken for transient collections

{code}
user=> (empty? (transient []))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentVector$TransientVector clojure.lang.RT.seqFrom (RT.java:528)

user=> (empty? (transient {}))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentArrayMap$TransientArrayMap clojure.lang.RT.seqFrom (RT.java:528)

user=> (empty? (transient #{}))
IllegalArgumentException Don't know how to create ISeq from: clojure.lang.PersistentHashSet$TransientHashSet clojure.lang.RT.seqFrom (RT.java:528)
{code}

The workaround is to use {{(zero? (count (transient ...)))}} check instead.

*Cause:* {{empty?}} is based on seqability, which transients don't implement.

*Proposed* Add a branch to {{empty?}} for counted? colls. Transients implement Counted so gain support via this branch. Other colls that are counted are faster. Seq branch continues to work for seqs.

Perf test:

{code}
(def p [])
(def p1 [1])
(def t (transient []))
(def t1 (transient [1]))

;; take last time of all these
(dotimes [i 20] (time (dotimes [_ 10000] (empty? p))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? p1))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? t))))
(dotimes [i 20] (time (dotimes [_ 10000] (empty? t1))))
{code}

Results:

||coll||before||after||result||
|p|0.72 ms|0.08 ms|much faster when empty|
|p1|0.15 ms|0.13 ms|slightly faster when not empty|
|t|error|0.19 ms|no longer errors|
|t1|error|0.20 ms|no longer errors|

Not sure if doc string should be tweaked to be more generic, particularly the "same as (not (seq coll))" which is now only true for Seqable but not Counted. I think the advise to use (seq coll) for seq checks is still good there.

I did a skim for other types that are Counted but not seqs/Seqable and didn't find much other than internal things like ChunkBuffer. Many are both and would thus use the counted path instead (all the persistent colls for example and any kind of IndexedSeq).

I guess another option would be just to fully switch empty? to be about (zero? (bounded-count 1 coll)) and lean on count's polymorphism completely.

*Patch:* clj-1872.patch
Alex Miller made changes -
Patch Code and Test [ 10002 ]

People

Vote (2)
Watch (7)

Dates

  • Created:
    Updated: