<< Back to previous view

[CLJ-1453] Most Iterator implementations do not correctly implement next failing to throw the required NoSuchElementException Created: 24/Jun/14  Updated: 30/Jul/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.8

Type: Defect Priority: Major
Reporter: Meikel Brandmeyer Assignee: Andrew Rosa
Resolution: Unresolved Votes: 3
Labels: interop

Attachments: Text File 0001-Fix-iterator-implementations-to-throw-NSEE-when-exha.patch     Text File 0001-Throw-NSEE-in-gvec-iterator.patch     Text File clj-1453-2.patch     Text File CLJ-1453.patch     Text File CLJ-1453-tests.patch    
Patch: Code and Test
Approval: Screened

 Description   

Iterators on Clojure's collections should follow the expected JDK behavior of throwing NoSuchElementException on next() when an iterator is exhausted. Current collections have a variety of other behaviors.

Issue encountered in real world code using http://pipes.tinkerpop.com.

To reproduce:

(-> [] .iterator .next)

This throws a NPE instead of NSEE.

(doto (.iterator [1 2]) .next .next .next)

This throws an ArrayIndexOutOfBoundsException instead of NSEE.

The attached patch fixes the methods by adding a check for hasNext before actually trying to provide the next element. If there is no next element the correct exception is thrown.

Patch: clj-1453-2.patch and tests: CLJ-1453-tests.patch

Screened by: Alex Miller



 Comments   
Comment by Tim McCormack [ 15/Jul/14 9:56 PM ]

To establish a baseline, this piece of code checks all the iterators I could find within Clojure 1.6.0 and makes sure they throw an appropriate exception:

iterator-checker.clj
(defn next-failure
  []
  (let [ok (atom [])]
    (doseq [[tp v]
            (sorted-map
             :vec-0 []
             :vec-n [1 2 3]
             :vec-start (subvec [1 2 3 4] 0 1)
             :vec-end (subvec [1 2 3 4] 3 4)
             :vec-ls-0 (.listIterator [])
             :vec-ls-n (.listIterator [1 2 3])
             :vec-start-ls (.listIterator (subvec [1 2 3 4] 0 1))
             :vec-end-ls (.listIterator (subvec [1 2 3 4] 3 4))
             :seq ()
             :list-n '(1 2 3)
             :set-hash-0 (hash-set)
             :set-hash-n (hash-set 1 2 3)
             :set-sor-0 (sorted-set)
             :set-sor-n (sorted-set 1 2 3)
             :map-arr-0 (array-map)
             :map-arr-n (array-map 1 2, 3 4)
             :map-hash-0 (hash-map)
             :map-hash-n (hash-map 1 2, 3 4)
             :map-sor-n (sorted-map)
             :map-sor-n (sorted-map 1 2, 3 4)
             :map-sor-ks-0 (.keys (sorted-map))
             :map-sor-ks-n (.keys (sorted-map 1 2, 3 4))
             :map-sor-vs-0 (.vals (sorted-map))
             :map-sor-vs-n (.vals (sorted-map 1 2, 3 4))
             :map-sor-rev-0 (.reverseIterator (sorted-map))
             :map-sor-rev-n (.reverseIterator (sorted-map 1 2, 3 4))
             :map-ks-0 (.keySet {})
             :map-ks-n (.keySet {1 2, 3 4})
             :map-vs-0 (.values {})
             :map-vs-n (.values {1 2, 3 4})
             :gvec-int-0 (vector-of :long)
             :gvec-int-n (vector-of :long 1 2 3))]
      (let [it (if (instance? java.util.Iterator v)
                 v
                 (.iterator v))]
        (when-not it
          (println "Null iterator:" tp))
        (try (dotimes [_ 50]
               (.next it))
             (catch java.util.NoSuchElementException nsee
               (swap! ok conj tp))
             (catch Throwable t
               (println tp "threw" (class t))))))
    (println "OK:" @ok)))

The output as of current Clojure master (201a0dd970) is:

:gvec-int-0 threw java.lang.IndexOutOfBoundsException
:gvec-int-n threw java.lang.IndexOutOfBoundsException
:map-arr-0 threw java.lang.ArrayIndexOutOfBoundsException
:map-arr-n threw java.lang.ArrayIndexOutOfBoundsException
:map-hash-0 threw java.lang.ArrayIndexOutOfBoundsException
:map-ks-0 threw java.lang.ArrayIndexOutOfBoundsException
:map-ks-n threw java.lang.ArrayIndexOutOfBoundsException
:map-sor-ks-0 threw java.util.EmptyStackException
:map-sor-ks-n threw java.util.EmptyStackException
:map-sor-n threw java.util.EmptyStackException
:map-sor-rev-0 threw java.util.EmptyStackException
:map-sor-rev-n threw java.util.EmptyStackException
:map-sor-vs-0 threw java.util.EmptyStackException
:map-sor-vs-n threw java.util.EmptyStackException
:map-vs-0 threw java.lang.ArrayIndexOutOfBoundsException
:map-vs-n threw java.lang.ArrayIndexOutOfBoundsException
:vec-0 threw java.lang.NullPointerException
:vec-end threw java.lang.ArrayIndexOutOfBoundsException
:vec-end-ls threw java.lang.IndexOutOfBoundsException
:vec-ls-0 threw java.lang.IndexOutOfBoundsException
:vec-ls-n threw java.lang.IndexOutOfBoundsException
:vec-n threw java.lang.ArrayIndexOutOfBoundsException
:vec-start threw java.lang.ArrayIndexOutOfBoundsException
:vec-start-ls threw java.lang.IndexOutOfBoundsException
OK: [:list-n :map-hash-n :seq :set-hash-0 :set-hash-n :set-sor-0 :set-sor-n]
Comment by Tim McCormack [ 15/Jul/14 9:57 PM ]

0001-Fix-iterator-implementations-to-throw-NSEE-when-exha.patch missed one thing: clojure.gvec. With the patch in place, my checker outputs the following:

:gvec-int-0 threw java.lang.IndexOutOfBoundsException
:gvec-int-n threw java.lang.IndexOutOfBoundsException
OK: [:list-n :map-arr-0 :map-arr-n :map-hash-0 :map-hash-n :map-ks-0 :map-ks-n :map-sor-ks-0 :map-sor-ks-n :map-sor-n :map-sor-rev-0 :map-sor-rev-n :map-sor-vs-0 :map-sor-vs-n :map-vs-0 :map-vs-n :seq :set-hash-0 :set-hash-n :set-sor-0 :set-sor-n :vec-0 :vec-end :vec-end-ls :vec-ls-0 :vec-ls-n :vec-n :vec-start :vec-start-ls]

That should be a quick fix.

Comment by Michał Marczyk [ 15/Jul/14 10:01 PM ]

CLJ-1416 includes a fix for gvec's iterator impls (and some other improvements to interop).

Comment by Tim McCormack [ 15/Jul/14 10:17 PM ]

Attaching a fix for the gvec iterator. Combined with the existing patch, this fixes all broken iterators that I could find.

Comment by Andy Fingerhut [ 07/Aug/14 10:25 AM ]

I believe this Clojure commit: https://github.com/clojure/clojure/commit/e7215ce82215bda33f4f0887cb88570776d558a0

introduces more implementations of the java.util.Iterator interface where next() returns null instead of throwing a NoSuchElementException

Comment by Alex Miller [ 29/Apr/15 11:34 AM ]

Would love to have a patch that:
1) combined patches to date
2) updated to master
3) reviewed for new iterators since this ticket was written
4) added the tests in the comments

Comment by Alex Miller [ 18/Jun/15 3:30 PM ]

Bump - would be happy to screen this for 1.8 if my last comments were addressed.

Comment by Andrew Rosa [ 18/Jul/15 4:38 PM ]

Alex Miller Addressed you comments on the first patch. On the process of applying them to master, I've changed the code to follow a single "format" of bound checking and throw, so everything will be more consistent with the code style of Clojure codebase. The commit themselves still maintain the original authors, to give correct credit.

The second patch includes only the tests for these features, which I used test.check to do them, as I talked to you on clojure-dev channel. If you think they are too much complex or prefer a different style of testing, please let me know - that's why I made a separate patch only for the tests.

Comment by Alex Miller [ 18/Jul/15 5:04 PM ]

Thanks Andrew. It will likely be a couple weeks before I have time to look at this.

Comment by Alex Miller [ 30/Jul/15 11:02 PM ]

clj-1453-2.patch squashes CLJ-1453.patch and updates to current master.





[CLJ-1224] Records do not cache hash like normal maps Created: 24/Jun/13  Updated: 31/Jul/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.8

Type: Enhancement Priority: Critical
Reporter: Brandon Bloom Assignee: Unassigned
Resolution: Unresolved Votes: 17
Labels: defrecord, performance

Attachments: Text File 0001-cache-hasheq-and-hashCode-for-records.patch     Text File 0001-cache-hasheq-and-hashCode-for-records-v2.patch     Text File 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch     Text File 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch    
Patch: Code and Test
Approval: Screened

 Description   

Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

Approach: Cache hash values in record definitions, similar to maps.

Timings:

coll 1.8.0-master 1.8.0-master+patch
small record 99 ns 9 ns
big record 455 ns 9 ns
(defrecord R [a b])  ;; small
(def r  (R. (range 1e3) (range 1e3)))
(bench (hash r))
(defrecord R [a b c d e f g h i j])  ;; big
(def r (map->R (zipmap [:a :b :c :d :e :f :g :h :i :j] (repeat (range 1e3)))))
(bench (hash r))

Patch: 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

Screened by: Alex Miller

Also see: http://dev.clojure.org/jira/browse/CLJS-281



 Comments   
Comment by Nicola Mometto [ 14/Feb/14 5:46 PM ]

I want to point out that my patch breaks ABI compatibility.
A possible approach to avoid this would be to have 3 constructors instead of 2, I can write the patch to support this if desired.

Comment by Stuart Halloway [ 27/Jun/14 11:09 AM ]

The patch 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch is broken in at least two ways:

  • The fields __hash and __hasheq are adopted by new records created by .assoc and .without, which will cause those records to have incorrect (and likely colliding) hash values
  • The addition of the new fields breaks the promise of defrecord, which includes an N+2 constructor taking meta and extmap. With the patch, defrecords get an N+4 constructor letting callers pick hash codes.

I found these problems via the following reasoning:

  • Code has been touched near __extmap
  • Grep for all uses of __extmap and see what breaks
Comment by Nicola Mometto [ 27/Jun/14 2:53 PM ]

Patch 0001-cache-hasheq-and-hashCode-for-records.patch fixes both those issues, reintroducing the N+2 arity constructor

Comment by Alex Miller [ 27/Jun/14 4:08 PM ]

Questions addressed, back to Vetted.

Comment by Andy Fingerhut [ 29/Aug/14 4:32 PM ]

All patches dated Jun 7 2014 and earlier no longer applied cleanly to latest master after some commits were made to Clojure on Aug 29, 2014. They did apply cleanly before that day.

I have not checked how easy or difficult it might be to update this patch.

Comment by Alex Miller [ 29/Aug/14 4:40 PM ]

Would be great to get this one updated as it's otherwise ready to screen.

Comment by Nicola Mometto [ 29/Aug/14 4:58 PM ]

Updated patch to apply to lastest master

Comment by Alex Miller [ 16/Jun/15 4:06 PM ]

1) hash and hasheq are stored as Objects, which seems kind of gross. It would be much better to store them as primitive longs and check whether they'd been calculated by comparing to 0. We still end up with a long -> int conversion but at least we'd avoid boxing.

2) assoc wrongly copies over the __hash and __hasheq to the new instance:

(defrecord R [a])
(def r (->R "abc"))
(hash r)                   ;; -1544829221
(hash (assoc r :a "def"))  ;; -1544829221

3) Needs some tests if they don't already exist:

  • with-meta on a record should not affect hashcode
  • modifying a record with assoc or dissoc should affect hashcode
  • maybe something else?

4) Needs some perf tests with a handful of example records (at least: 1 field, many fields, extmap, etc).

Nicola, I'm happy to let you continue to do dev on this patch with me doing the screening if you have time. Or if you don't have time, let me know and I (or someone else) can work on the dev parts. Would like to get this prepped and clean for 1.8.

Comment by Nicola Mometto [ 16/Jun/15 5:56 PM ]

Updated patch addresses the issues raised, will add some benchmarks later

Comment by Nicola Mometto [ 16/Jun/15 5:59 PM ]

Alex, regarding point 1, I stored __hash and __hasheq as ints rather than longs and compared to -1 rather than 0, to be consistent with how it's done elsewhere in the clojure impl

Comment by Alex Miller [ 17/Jun/15 11:39 AM ]

Looking at the bytecode for hashcode and hasheq, I had two questions:

1) the -1 there is a long and requires an upcast of the private field from int to long. I'm sure that's not a big deal, but wish there was a way to avoid it. I didn't try it but maybe (int -1) would help the compiler out?

2) I wonder whether something like this would perform better:

`(hasheq [this#] 
   (if (== -1 ~'__hasheq)
     (set! ~'__hasheq (int (bit-xor ~type-hash (clojure.lang.APersistentMap/mapHasheq this#)))))
   ~'__hasheq)

The common case will be a failed compare and then the field can be loaded and returned directly without any casting.

Comment by Nicola Mometto [ 17/Jun/15 11:54 AM ]

1- there's no Numbers.equiv(int, int) so even casting -1 to an int wouldn't solve this. a cast is always necessary. if we were to make hasheq a long, we'd need l2i in the return path, making hasheq an int we need an i2l in the comparison.
2- that doesn't remove any casting, it just replaces a load from the local variable stack with a field load:

;; current version
0: ldc2_w        #203                // long -1l
3: aload_0
4: getfield      #236                // Field __hasheq:I
7: i2l
8: lcmp
9: ifne          38
12: ldc2_w        #267                // long 1340417398l
15: aload_0
16: checkcast     #16                 // class clojure/lang/IPersistentMap
19: invokestatic  #274                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
22: i2l
23: lxor
24: invokestatic  #278                // Method clojure/lang/RT.intCast:(J)I
27: istore_1
28: aload_0
29: iload_1
30: putfield      #236                // Field __hasheq:I
33: iload_1
34: goto          42
37: pop
38: aload_0
39: getfield      #236                // Field __hasheq:I
42: ireturn
;; your version
0: ldc2_w        #203                // long -1l
3: aload_0
4: getfield      #236                // Field __hasheq:I
7: i2l
8: lcmp
9: ifne          35
12: aload_0
13: ldc2_w        #267                // long 1340417398l
16: aload_0
17: checkcast     #16                 // class clojure/lang/IPersistentMap
20: invokestatic  #274                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
23: i2l
24: lxor
25: invokestatic  #278                // Method clojure/lang/RT.intCast:(J)I
28: putfield      #236                // Field __hasheq:I
31: goto          37
34: pop
35: aconst_null
36: pop
37: aload_0
38: getfield      #236                // Field __hasheq:I
41: ireturn
Comment by Alex Miller [ 17/Jun/15 12:01 PM ]

Fair enough! Looks pretty good to me, still needs the perf numbers.

Comment by Nicola Mometto [ 17/Jun/15 1:00 PM ]
coll 1.7.0-RC2 1.7.0-RC2+patch
big record ~940ns ~10ns
small record ~150ns ~11ns
;; big record, 1.7.0-RC2
user=> (use 'criterium.core)
nil
user=> (defrecord R [a b c d e f g h i j])
user.R
user=> (def r (map->R (zipmap [:a :b :c :d :e :f :g :h :i :j] (repeat (range 1e3)))))
#'user/r
user=> (bench (hash r))
WARNING: Final GC required 1.291182176566658 % of runtime
Evaluation count : 63385020 in 60 samples of 1056417 calls.
             Execution time mean : 943.320293 ns
    Execution time std-deviation : 44.001842 ns
   Execution time lower quantile : 891.919381 ns ( 2.5%)
   Execution time upper quantile : 1.033894 µs (97.5%)
                   Overhead used : 1.980453 ns

;; big record, 1.7.0-RC2 + patch
user=> (defrecord R [a b c d e f g h i j])
user.R
user=> (def r (map->R (zipmap [:a :b :c :d :e :f :g :h :i :j] (repeat (range 1e3)))))
#'user/r
user=> (bench (hash r))
WARNING: Final GC required 1.0097162582088168 % of runtime
Evaluation count : 4820968380 in 60 samples of 80349473 calls.
             Execution time mean : 10.657581 ns
    Execution time std-deviation : 0.668011 ns
   Execution time lower quantile : 9.975656 ns ( 2.5%)
   Execution time upper quantile : 12.190471 ns (97.5%)
                   Overhead used : 2.235715 ns

;; small record 1.7.0-RC2
user=> (defrecord R [a b])
user.R
user=> (def r  (R. (range 1e3) (range 1e3)))
#'user/r
user=> (bench (hash r))
WARNING: Final GC required 1.456092401467115 % of runtime
Evaluation count : 423779160 in 60 samples of 7062986 calls.
             Execution time mean : 147.154359 ns
    Execution time std-deviation : 8.148340 ns
   Execution time lower quantile : 138.052054 ns ( 2.5%)
   Execution time upper quantile : 165.573489 ns (97.5%)
                   Overhead used : 1.629944 ns

;; small record 1.7.0-RC2+patch
user=> (defrecord R [a b])
user.R
user=> (def r  (R. (range 1e3) (range 1e3)))
#'user/r
user=>  (bench (hash r))
WARNING: Final GC required 1.720638384341818 % of runtime
Evaluation count : 4483195020 in 60 samples of 74719917 calls.
             Execution time mean : 11.696574 ns
    Execution time std-deviation : 0.506482 ns
   Execution time lower quantile : 10.982760 ns ( 2.5%)
   Execution time upper quantile : 12.836103 ns (97.5%)
                   Overhead used : 2.123801 ns
Comment by Alex Miller [ 17/Jun/15 3:36 PM ]

Screened for 1.8.

Comment by Alex Miller [ 22/Jun/15 8:38 AM ]

Note that using -1 for the uncomputed hash value can cause issues with transient lazily computed hash codes on serialization (CLJ-1766). In this case, the defrecord cached code is not transient so I don't think it's a problem, but something to be aware of. Using 0 would avoid this potential issue.

Comment by Rich Hickey [ 17/Jul/15 12:00 PM ]

So, why not use 0? You won't need initialization then either

Comment by Nicola Mometto [ 17/Jul/15 7:50 PM ]

Updated patch so that it applies on current master and changed the default hash value from -1 to 0.

Rich, we still need initialization since all the record ctors delegate to the ctor arity with explicit __hash and __hasheq, following the approach of the alt ctors for __extmap and __meta

Comment by Alex Miller [ 17/Jul/15 10:30 PM ]

Moving back to vetted for screening

Comment by Alex Miller [ 28/Jul/15 6:17 PM ]

Hey Nicola, two comments on the hasheq/hashcode impl:

1) I don't think there's any reason to use == in the check instead of =, and = seems better (I think the resulting bytecode is same either way though).

2) The generated bytecode in these cases will call getfield twice in the cached case (once for the check, and once for the return):

public int hasheq();
    Code:
       0: lconst_0      
       1: aload_0       
       2: getfield      #232                // Field __hasheq:I     ;; <-- HERE
       5: i2l           
       6: lcmp          
       7: ifne          36
      10: ldc2_w        #263                // long -989260517l
      13: aload_0       
      14: checkcast     #16                 // class clojure/lang/IPersistentMap
      17: invokestatic  #270                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
      20: i2l           
      21: lxor          
      22: invokestatic  #274                // Method clojure/lang/RT.intCast:(J)I
      25: istore_1      
      26: aload_0       
      27: iload_1       
      28: putfield      #232                // Field __hasheq:I
      31: iload_1       
      32: goto          40
      35: pop           
      36: aload_0       
      37: getfield      #232                // Field __hasheq:I   ;; <-- HERE
      40: ireturn

Letting a local will avoid that:

`(hasheq [this#] (let [hv# ~'__hasheq]     ;; ADDED
                                  (if (= 0 hv#)           ;; USED
                                    (let [h# (int (bit-xor ~type-hash (clojure.lang.APersistentMap/mapHasheq this#)))]
                                      (set! ~'__hasheq h#)
                                      h#)
                                    hv#)))                ;; USED

Output bytecode:

public int hasheq();
    Code:
       0: aload_0       
       1: getfield      #227                // Field __hasheq:I
       4: istore_1      
       5: lconst_0      
       6: iload_1       
       7: i2l           
       8: lcmp          
       9: ifne          38
      12: ldc2_w        #258                // long -989260517l
      15: aload_0       
      16: checkcast     #16                 // class clojure/lang/IPersistentMap
      19: invokestatic  #265                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
      22: i2l           
      23: lxor          
      24: invokestatic  #269                // Method clojure/lang/RT.intCast:(J)I
      27: istore_2      
      28: aload_0       
      29: iload_2       
      30: putfield      #227                // Field __hasheq:I
      33: iload_2       
      34: goto          39
      37: pop           
      38: iload_1       
      39: ireturn

For me, this was about 2% faster in bench too.

Comment by Alex Miller [ 28/Jul/15 6:18 PM ]

Equivalent change in hashCode too.

Comment by Nicola Mometto [ 29/Jul/15 4:06 AM ]

Updated patch takes into account Alex's last notes





Generated at Mon Aug 03 08:19:10 CDT 2015 using JIRA 4.4#649-r158309.