Clojure

Records do not cache hash like normal maps

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Critical Critical
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: Release 1.8
  • Component/s: None
  • Patch:
    Code and Test
  • Approval:
    Screened

Description

Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

Approach: Cache hash values in record definitions, similar to maps.

Timings:

coll 1.8.0-master 1.8.0-master+patch
small record 99 ns 9 ns
big record 455 ns 9 ns
(defrecord R [a b])  ;; small
(def r  (R. (range 1e3) (range 1e3)))
(bench (hash r))
(defrecord R [a b c d e f g h i j])  ;; big
(def r (map->R (zipmap [:a :b :c :d :e :f :g :h :i :j] (repeat (range 1e3)))))
(bench (hash r))

Patch: 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

Screened by: Alex Miller

Also see: http://dev.clojure.org/jira/browse/CLJS-281

Activity

Nicola Mometto made changes -
Field Original Value New Value
Attachment 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch [ 12041 ]
Brandon Bloom made changes -
Labels patch patch,
Andy Fingerhut made changes -
Patch Code [ 10001 ]
Alex Miller made changes -
Labels patch patch,
Alex Miller made changes -
Approval Triaged [ 10120 ]
Alex Miller made changes -
Priority Minor [ 4 ] Critical [ 2 ]
Alex Miller made changes -
Labels defrecord performance
Alex Miller made changes -
Description This came up with dnolen and ambrose in irc. Apparently it was also an issue for Mark Engleberg with Instaparse. This has been fixed in CLJS, but still affects JVM CLJ.

See http://dev.clojure.org/jira/browse/CLJS-281
Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Issue Type Defect [ 1 ] Enhancement [ 4 ]
Alex Miller made changes -
Priority Critical [ 2 ] Major [ 3 ]
Hide
Nicola Mometto added a comment -

I want to point out that my patch breaks ABI compatibility.
A possible approach to avoid this would be to have 3 constructors instead of 2, I can write the patch to support this if desired.

Show
Nicola Mometto added a comment - I want to point out that my patch breaks ABI compatibility. A possible approach to avoid this would be to have 3 constructors instead of 2, I can write the patch to support this if desired.
Rich Hickey made changes -
Approval Triaged [ 10120 ] Vetted [ 10003 ]
Rich Hickey made changes -
Fix Version/s Release 1.7 [ 10250 ]
Hide
Stuart Halloway added a comment -

The patch 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch is broken in at least two ways:

  • The fields __hash and __hasheq are adopted by new records created by .assoc and .without, which will cause those records to have incorrect (and likely colliding) hash values
  • The addition of the new fields breaks the promise of defrecord, which includes an N+2 constructor taking meta and extmap. With the patch, defrecords get an N+4 constructor letting callers pick hash codes.

I found these problems via the following reasoning:

  • Code has been touched near __extmap
  • Grep for all uses of __extmap and see what breaks
Show
Stuart Halloway added a comment - The patch 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch is broken in at least two ways:
  • The fields __hash and __hasheq are adopted by new records created by .assoc and .without, which will cause those records to have incorrect (and likely colliding) hash values
  • The addition of the new fields breaks the promise of defrecord, which includes an N+2 constructor taking meta and extmap. With the patch, defrecords get an N+4 constructor letting callers pick hash codes.
I found these problems via the following reasoning:
  • Code has been touched near __extmap
  • Grep for all uses of __extmap and see what breaks
Stuart Halloway made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Hide
Nicola Mometto added a comment -

Patch 0001-cache-hasheq-and-hashCode-for-records.patch fixes both those issues, reintroducing the N+2 arity constructor

Show
Nicola Mometto added a comment - Patch 0001-cache-hasheq-and-hashCode-for-records.patch fixes both those issues, reintroducing the N+2 arity constructor
Nicola Mometto made changes -
Attachment 0001-cache-hasheq-and-hashCode-for-records.patch [ 13105 ]
Hide
Alex Miller added a comment -

Questions addressed, back to Vetted.

Show
Alex Miller added a comment - Questions addressed, back to Vetted.
Alex Miller made changes -
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Hide
Andy Fingerhut added a comment -

All patches dated Jun 7 2014 and earlier no longer applied cleanly to latest master after some commits were made to Clojure on Aug 29, 2014. They did apply cleanly before that day.

I have not checked how easy or difficult it might be to update this patch.

Show
Andy Fingerhut added a comment - All patches dated Jun 7 2014 and earlier no longer applied cleanly to latest master after some commits were made to Clojure on Aug 29, 2014. They did apply cleanly before that day. I have not checked how easy or difficult it might be to update this patch.
Hide
Alex Miller added a comment -

Would be great to get this one updated as it's otherwise ready to screen.

Show
Alex Miller added a comment - Would be great to get this one updated as it's otherwise ready to screen.
Nicola Mometto made changes -
Attachment 0001-cache-hasheq-and-hashCode-for-records.patch [ 13105 ]
Nicola Mometto made changes -
Attachment 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch [ 12041 ]
Hide
Nicola Mometto added a comment -

Updated patch to apply to lastest master

Show
Nicola Mometto added a comment - Updated patch to apply to lastest master
Nicola Mometto made changes -
Attachment 0001-cache-hasheq-and-hashCode-for-records.patch [ 13284 ]
Nicola Mometto made changes -
Attachment 0001-cache-hasheq-and-hashCode-for-records.patch [ 13284 ]
Nicola Mometto made changes -
Alex Miller made changes -
Fix Version/s Release 1.7 [ 10250 ]
Fix Version/s Release 1.8 [ 10254 ]
Alex Miller made changes -
Priority Major [ 3 ] Critical [ 2 ]
Alex Miller made changes -
Issue Type Enhancement [ 4 ] Defect [ 1 ]
Hide
Alex Miller added a comment -

1) hash and hasheq are stored as Objects, which seems kind of gross. It would be much better to store them as primitive longs and check whether they'd been calculated by comparing to 0. We still end up with a long -> int conversion but at least we'd avoid boxing.

2) assoc wrongly copies over the __hash and __hasheq to the new instance:

(defrecord R [a])
(def r (->R "abc"))
(hash r)                   ;; -1544829221
(hash (assoc r :a "def"))  ;; -1544829221

3) Needs some tests if they don't already exist:

  • with-meta on a record should not affect hashcode
  • modifying a record with assoc or dissoc should affect hashcode
  • maybe something else?

4) Needs some perf tests with a handful of example records (at least: 1 field, many fields, extmap, etc).

Nicola, I'm happy to let you continue to do dev on this patch with me doing the screening if you have time. Or if you don't have time, let me know and I (or someone else) can work on the dev parts. Would like to get this prepped and clean for 1.8.

Show
Alex Miller added a comment - 1) hash and hasheq are stored as Objects, which seems kind of gross. It would be much better to store them as primitive longs and check whether they'd been calculated by comparing to 0. We still end up with a long -> int conversion but at least we'd avoid boxing. 2) assoc wrongly copies over the __hash and __hasheq to the new instance:
(defrecord R [a])
(def r (->R "abc"))
(hash r)                   ;; -1544829221
(hash (assoc r :a "def"))  ;; -1544829221
3) Needs some tests if they don't already exist:
  • with-meta on a record should not affect hashcode
  • modifying a record with assoc or dissoc should affect hashcode
  • maybe something else?
4) Needs some perf tests with a handful of example records (at least: 1 field, many fields, extmap, etc). Nicola, I'm happy to let you continue to do dev on this patch with me doing the screening if you have time. Or if you don't have time, let me know and I (or someone else) can work on the dev parts. Would like to get this prepped and clean for 1.8.
Alex Miller made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Hide
Nicola Mometto added a comment -

Updated patch addresses the issues raised, will add some benchmarks later

Show
Nicola Mometto added a comment - Updated patch addresses the issues raised, will add some benchmarks later
Nicola Mometto made changes -
Description Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Attachment 0001-cache-hasheq-and-hashCode-for-records-v2.patch [ 14255 ]
Hide
Nicola Mometto added a comment - - edited

Alex, regarding point 1, I stored __hash and __hasheq as ints rather than longs and compared to -1 rather than 0, to be consistent with how it's done elsewhere in the clojure impl

Show
Nicola Mometto added a comment - - edited Alex, regarding point 1, I stored __hash and __hasheq as ints rather than longs and compared to -1 rather than 0, to be consistent with how it's done elsewhere in the clojure impl
Hide
Alex Miller added a comment -

Looking at the bytecode for hashcode and hasheq, I had two questions:

1) the -1 there is a long and requires an upcast of the private field from int to long. I'm sure that's not a big deal, but wish there was a way to avoid it. I didn't try it but maybe (int -1) would help the compiler out?

2) I wonder whether something like this would perform better:

`(hasheq [this#] 
   (if (== -1 ~'__hasheq)
     (set! ~'__hasheq (int (bit-xor ~type-hash (clojure.lang.APersistentMap/mapHasheq this#)))))
   ~'__hasheq)

The common case will be a failed compare and then the field can be loaded and returned directly without any casting.

Show
Alex Miller added a comment - Looking at the bytecode for hashcode and hasheq, I had two questions: 1) the -1 there is a long and requires an upcast of the private field from int to long. I'm sure that's not a big deal, but wish there was a way to avoid it. I didn't try it but maybe (int -1) would help the compiler out? 2) I wonder whether something like this would perform better:
`(hasheq [this#] 
   (if (== -1 ~'__hasheq)
     (set! ~'__hasheq (int (bit-xor ~type-hash (clojure.lang.APersistentMap/mapHasheq this#)))))
   ~'__hasheq)
The common case will be a failed compare and then the field can be loaded and returned directly without any casting.
Hide
Nicola Mometto added a comment -

1- there's no Numbers.equiv(int, int) so even casting -1 to an int wouldn't solve this. a cast is always necessary. if we were to make hasheq a long, we'd need l2i in the return path, making hasheq an int we need an i2l in the comparison.
2- that doesn't remove any casting, it just replaces a load from the local variable stack with a field load:

;; current version
0: ldc2_w        #203                // long -1l
3: aload_0
4: getfield      #236                // Field __hasheq:I
7: i2l
8: lcmp
9: ifne          38
12: ldc2_w        #267                // long 1340417398l
15: aload_0
16: checkcast     #16                 // class clojure/lang/IPersistentMap
19: invokestatic  #274                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
22: i2l
23: lxor
24: invokestatic  #278                // Method clojure/lang/RT.intCast:(J)I
27: istore_1
28: aload_0
29: iload_1
30: putfield      #236                // Field __hasheq:I
33: iload_1
34: goto          42
37: pop
38: aload_0
39: getfield      #236                // Field __hasheq:I
42: ireturn
;; your version
0: ldc2_w        #203                // long -1l
3: aload_0
4: getfield      #236                // Field __hasheq:I
7: i2l
8: lcmp
9: ifne          35
12: aload_0
13: ldc2_w        #267                // long 1340417398l
16: aload_0
17: checkcast     #16                 // class clojure/lang/IPersistentMap
20: invokestatic  #274                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
23: i2l
24: lxor
25: invokestatic  #278                // Method clojure/lang/RT.intCast:(J)I
28: putfield      #236                // Field __hasheq:I
31: goto          37
34: pop
35: aconst_null
36: pop
37: aload_0
38: getfield      #236                // Field __hasheq:I
41: ireturn
Show
Nicola Mometto added a comment - 1- there's no Numbers.equiv(int, int) so even casting -1 to an int wouldn't solve this. a cast is always necessary. if we were to make hasheq a long, we'd need l2i in the return path, making hasheq an int we need an i2l in the comparison. 2- that doesn't remove any casting, it just replaces a load from the local variable stack with a field load:
;; current version
0: ldc2_w        #203                // long -1l
3: aload_0
4: getfield      #236                // Field __hasheq:I
7: i2l
8: lcmp
9: ifne          38
12: ldc2_w        #267                // long 1340417398l
15: aload_0
16: checkcast     #16                 // class clojure/lang/IPersistentMap
19: invokestatic  #274                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
22: i2l
23: lxor
24: invokestatic  #278                // Method clojure/lang/RT.intCast:(J)I
27: istore_1
28: aload_0
29: iload_1
30: putfield      #236                // Field __hasheq:I
33: iload_1
34: goto          42
37: pop
38: aload_0
39: getfield      #236                // Field __hasheq:I
42: ireturn
;; your version
0: ldc2_w        #203                // long -1l
3: aload_0
4: getfield      #236                // Field __hasheq:I
7: i2l
8: lcmp
9: ifne          35
12: aload_0
13: ldc2_w        #267                // long 1340417398l
16: aload_0
17: checkcast     #16                 // class clojure/lang/IPersistentMap
20: invokestatic  #274                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
23: i2l
24: lxor
25: invokestatic  #278                // Method clojure/lang/RT.intCast:(J)I
28: putfield      #236                // Field __hasheq:I
31: goto          37
34: pop
35: aconst_null
36: pop
37: aload_0
38: getfield      #236                // Field __hasheq:I
41: ireturn
Hide
Alex Miller added a comment -

Fair enough! Looks pretty good to me, still needs the perf numbers.

Show
Alex Miller added a comment - Fair enough! Looks pretty good to me, still needs the perf numbers.
Alex Miller made changes -
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Hide
Nicola Mometto added a comment -
coll 1.7.0-RC2 1.7.0-RC2+patch
big record ~940ns ~10ns
small record ~150ns ~11ns
;; big record, 1.7.0-RC2
user=> (use 'criterium.core)
nil
user=> (defrecord R [a b c d e f g h i j])
user.R
user=> (def r (map->R (zipmap [:a :b :c :d :e :f :g :h :i :j] (repeat (range 1e3)))))
#'user/r
user=> (bench (hash r))
WARNING: Final GC required 1.291182176566658 % of runtime
Evaluation count : 63385020 in 60 samples of 1056417 calls.
             Execution time mean : 943.320293 ns
    Execution time std-deviation : 44.001842 ns
   Execution time lower quantile : 891.919381 ns ( 2.5%)
   Execution time upper quantile : 1.033894 µs (97.5%)
                   Overhead used : 1.980453 ns

;; big record, 1.7.0-RC2 + patch
user=> (defrecord R [a b c d e f g h i j])
user.R
user=> (def r (map->R (zipmap [:a :b :c :d :e :f :g :h :i :j] (repeat (range 1e3)))))
#'user/r
user=> (bench (hash r))
WARNING: Final GC required 1.0097162582088168 % of runtime
Evaluation count : 4820968380 in 60 samples of 80349473 calls.
             Execution time mean : 10.657581 ns
    Execution time std-deviation : 0.668011 ns
   Execution time lower quantile : 9.975656 ns ( 2.5%)
   Execution time upper quantile : 12.190471 ns (97.5%)
                   Overhead used : 2.235715 ns

;; small record 1.7.0-RC2
user=> (defrecord R [a b])
user.R
user=> (def r  (R. (range 1e3) (range 1e3)))
#'user/r
user=> (bench (hash r))
WARNING: Final GC required 1.456092401467115 % of runtime
Evaluation count : 423779160 in 60 samples of 7062986 calls.
             Execution time mean : 147.154359 ns
    Execution time std-deviation : 8.148340 ns
   Execution time lower quantile : 138.052054 ns ( 2.5%)
   Execution time upper quantile : 165.573489 ns (97.5%)
                   Overhead used : 1.629944 ns

;; small record 1.7.0-RC2+patch
user=> (defrecord R [a b])
user.R
user=> (def r  (R. (range 1e3) (range 1e3)))
#'user/r
user=>  (bench (hash r))
WARNING: Final GC required 1.720638384341818 % of runtime
Evaluation count : 4483195020 in 60 samples of 74719917 calls.
             Execution time mean : 11.696574 ns
    Execution time std-deviation : 0.506482 ns
   Execution time lower quantile : 10.982760 ns ( 2.5%)
   Execution time upper quantile : 12.836103 ns (97.5%)
                   Overhead used : 2.123801 ns
Show
Nicola Mometto added a comment -
coll 1.7.0-RC2 1.7.0-RC2+patch
big record ~940ns ~10ns
small record ~150ns ~11ns
;; big record, 1.7.0-RC2
user=> (use 'criterium.core)
nil
user=> (defrecord R [a b c d e f g h i j])
user.R
user=> (def r (map->R (zipmap [:a :b :c :d :e :f :g :h :i :j] (repeat (range 1e3)))))
#'user/r
user=> (bench (hash r))
WARNING: Final GC required 1.291182176566658 % of runtime
Evaluation count : 63385020 in 60 samples of 1056417 calls.
             Execution time mean : 943.320293 ns
    Execution time std-deviation : 44.001842 ns
   Execution time lower quantile : 891.919381 ns ( 2.5%)
   Execution time upper quantile : 1.033894 µs (97.5%)
                   Overhead used : 1.980453 ns

;; big record, 1.7.0-RC2 + patch
user=> (defrecord R [a b c d e f g h i j])
user.R
user=> (def r (map->R (zipmap [:a :b :c :d :e :f :g :h :i :j] (repeat (range 1e3)))))
#'user/r
user=> (bench (hash r))
WARNING: Final GC required 1.0097162582088168 % of runtime
Evaluation count : 4820968380 in 60 samples of 80349473 calls.
             Execution time mean : 10.657581 ns
    Execution time std-deviation : 0.668011 ns
   Execution time lower quantile : 9.975656 ns ( 2.5%)
   Execution time upper quantile : 12.190471 ns (97.5%)
                   Overhead used : 2.235715 ns

;; small record 1.7.0-RC2
user=> (defrecord R [a b])
user.R
user=> (def r  (R. (range 1e3) (range 1e3)))
#'user/r
user=> (bench (hash r))
WARNING: Final GC required 1.456092401467115 % of runtime
Evaluation count : 423779160 in 60 samples of 7062986 calls.
             Execution time mean : 147.154359 ns
    Execution time std-deviation : 8.148340 ns
   Execution time lower quantile : 138.052054 ns ( 2.5%)
   Execution time upper quantile : 165.573489 ns (97.5%)
                   Overhead used : 1.629944 ns

;; small record 1.7.0-RC2+patch
user=> (defrecord R [a b])
user.R
user=> (def r  (R. (range 1e3) (range 1e3)))
#'user/r
user=>  (bench (hash r))
WARNING: Final GC required 1.720638384341818 % of runtime
Evaluation count : 4483195020 in 60 samples of 74719917 calls.
             Execution time mean : 11.696574 ns
    Execution time std-deviation : 0.506482 ns
   Execution time lower quantile : 10.982760 ns ( 2.5%)
   Execution time upper quantile : 12.836103 ns (97.5%)
                   Overhead used : 2.123801 ns
Nicola Mometto made changes -
Description Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Timings:*
||coll||1.7.0-RC2||1.7.0-RC2+patch||
|big record|~940ns|~10ns|
|small record|~150ns|~11ns|

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Alex Miller made changes -
Labels defrecord performance defrecord ft performance
Alex Miller made changes -
Labels defrecord ft performance defrecord performance
Hide
Alex Miller added a comment -

Screened for 1.8.

Show
Alex Miller added a comment - Screened for 1.8.
Alex Miller made changes -
Patch Code [ 10001 ] Code and Test [ 10002 ]
Approval Vetted [ 10003 ] Screened [ 10004 ]
Description Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Timings:*
||coll||1.7.0-RC2||1.7.0-RC2+patch||
|big record|~940ns|~10ns|
|small record|~150ns|~11ns|

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Timings:*
||coll||1.7.0-RC2||1.7.0-RC2+patch||
|big record|~940ns|~10ns|
|small record|~150ns|~11ns|

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

*Screened by:* Alex Miller

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Hide
Alex Miller added a comment -

Note that using -1 for the uncomputed hash value can cause issues with transient lazily computed hash codes on serialization (CLJ-1766). In this case, the defrecord cached code is not transient so I don't think it's a problem, but something to be aware of. Using 0 would avoid this potential issue.

Show
Alex Miller added a comment - Note that using -1 for the uncomputed hash value can cause issues with transient lazily computed hash codes on serialization (CLJ-1766). In this case, the defrecord cached code is not transient so I don't think it's a problem, but something to be aware of. Using 0 would avoid this potential issue.
Hide
Rich Hickey added a comment - - edited

So, why not use 0? You won't need initialization then either

Show
Rich Hickey added a comment - - edited So, why not use 0? You won't need initialization then either
Rich Hickey made changes -
Approval Screened [ 10004 ] Incomplete [ 10006 ]
Hide
Nicola Mometto added a comment -

Updated patch so that it applies on current master and changed the default hash value from -1 to 0.

Rich, we still need initialization since all the record ctors delegate to the ctor arity with explicit __hash and __hasheq, following the approach of the alt ctors for __extmap and __meta

Show
Nicola Mometto added a comment - Updated patch so that it applies on current master and changed the default hash value from -1 to 0. Rich, we still need initialization since all the record ctors delegate to the ctor arity with explicit __hash and __hasheq, following the approach of the alt ctors for __extmap and __meta
Nicola Mometto made changes -
Description Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Timings:*
||coll||1.7.0-RC2||1.7.0-RC2+patch||
|big record|~940ns|~10ns|
|small record|~150ns|~11ns|

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

*Screened by:* Alex Miller

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Timings:*
||coll||1.7.0-RC2||1.7.0-RC2+patch||
|big record|~940ns|~10ns|
|small record|~150ns|~11ns|

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch

*Screened by:* Alex Miller

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Attachment 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch [ 14497 ]
Hide
Alex Miller added a comment -

Moving back to vetted for screening

Show
Alex Miller added a comment - Moving back to vetted for screening
Alex Miller made changes -
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Hide
Alex Miller added a comment -

Hey Nicola, two comments on the hasheq/hashcode impl:

1) I don't think there's any reason to use == in the check instead of =, and = seems better (I think the resulting bytecode is same either way though).

2) The generated bytecode in these cases will call getfield twice in the cached case (once for the check, and once for the return):

public int hasheq();
    Code:
       0: lconst_0      
       1: aload_0       
       2: getfield      #232                // Field __hasheq:I     ;; <-- HERE
       5: i2l           
       6: lcmp          
       7: ifne          36
      10: ldc2_w        #263                // long -989260517l
      13: aload_0       
      14: checkcast     #16                 // class clojure/lang/IPersistentMap
      17: invokestatic  #270                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
      20: i2l           
      21: lxor          
      22: invokestatic  #274                // Method clojure/lang/RT.intCast:(J)I
      25: istore_1      
      26: aload_0       
      27: iload_1       
      28: putfield      #232                // Field __hasheq:I
      31: iload_1       
      32: goto          40
      35: pop           
      36: aload_0       
      37: getfield      #232                // Field __hasheq:I   ;; <-- HERE
      40: ireturn

Letting a local will avoid that:

`(hasheq [this#] (let [hv# ~'__hasheq]     ;; ADDED
                                  (if (= 0 hv#)           ;; USED
                                    (let [h# (int (bit-xor ~type-hash (clojure.lang.APersistentMap/mapHasheq this#)))]
                                      (set! ~'__hasheq h#)
                                      h#)
                                    hv#)))                ;; USED

Output bytecode:

public int hasheq();
    Code:
       0: aload_0       
       1: getfield      #227                // Field __hasheq:I
       4: istore_1      
       5: lconst_0      
       6: iload_1       
       7: i2l           
       8: lcmp          
       9: ifne          38
      12: ldc2_w        #258                // long -989260517l
      15: aload_0       
      16: checkcast     #16                 // class clojure/lang/IPersistentMap
      19: invokestatic  #265                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
      22: i2l           
      23: lxor          
      24: invokestatic  #269                // Method clojure/lang/RT.intCast:(J)I
      27: istore_2      
      28: aload_0       
      29: iload_2       
      30: putfield      #227                // Field __hasheq:I
      33: iload_2       
      34: goto          39
      37: pop           
      38: iload_1       
      39: ireturn

For me, this was about 2% faster in bench too.

Show
Alex Miller added a comment - Hey Nicola, two comments on the hasheq/hashcode impl: 1) I don't think there's any reason to use == in the check instead of =, and = seems better (I think the resulting bytecode is same either way though). 2) The generated bytecode in these cases will call getfield twice in the cached case (once for the check, and once for the return):
public int hasheq();
    Code:
       0: lconst_0      
       1: aload_0       
       2: getfield      #232                // Field __hasheq:I     ;; <-- HERE
       5: i2l           
       6: lcmp          
       7: ifne          36
      10: ldc2_w        #263                // long -989260517l
      13: aload_0       
      14: checkcast     #16                 // class clojure/lang/IPersistentMap
      17: invokestatic  #270                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
      20: i2l           
      21: lxor          
      22: invokestatic  #274                // Method clojure/lang/RT.intCast:(J)I
      25: istore_1      
      26: aload_0       
      27: iload_1       
      28: putfield      #232                // Field __hasheq:I
      31: iload_1       
      32: goto          40
      35: pop           
      36: aload_0       
      37: getfield      #232                // Field __hasheq:I   ;; <-- HERE
      40: ireturn
Letting a local will avoid that:
`(hasheq [this#] (let [hv# ~'__hasheq]     ;; ADDED
                                  (if (= 0 hv#)           ;; USED
                                    (let [h# (int (bit-xor ~type-hash (clojure.lang.APersistentMap/mapHasheq this#)))]
                                      (set! ~'__hasheq h#)
                                      h#)
                                    hv#)))                ;; USED
Output bytecode:
public int hasheq();
    Code:
       0: aload_0       
       1: getfield      #227                // Field __hasheq:I
       4: istore_1      
       5: lconst_0      
       6: iload_1       
       7: i2l           
       8: lcmp          
       9: ifne          38
      12: ldc2_w        #258                // long -989260517l
      15: aload_0       
      16: checkcast     #16                 // class clojure/lang/IPersistentMap
      19: invokestatic  #265                // Method clojure/lang/APersistentMap.mapHasheq:(Lclojure/lang/IPersistentMap;)I
      22: i2l           
      23: lxor          
      24: invokestatic  #269                // Method clojure/lang/RT.intCast:(J)I
      27: istore_2      
      28: aload_0       
      29: iload_2       
      30: putfield      #227                // Field __hasheq:I
      33: iload_2       
      34: goto          39
      37: pop           
      38: iload_1       
      39: ireturn
For me, this was about 2% faster in bench too.
Alex Miller made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Hide
Alex Miller added a comment -

Equivalent change in hashCode too.

Show
Alex Miller added a comment - Equivalent change in hashCode too.
Hide
Nicola Mometto added a comment -

Updated patch takes into account Alex's last notes

Show
Nicola Mometto added a comment - Updated patch takes into account Alex's last notes
Nicola Mometto made changes -
Description Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Timings:*
||coll||1.7.0-RC2||1.7.0-RC2+patch||
|big record|~940ns|~10ns|
|small record|~150ns|~11ns|

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch

*Screened by:* Alex Miller

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Timings:*
||coll||1.7.0-RC2||1.7.0-RC2+patch||
|big record|~940ns|~10ns|
|small record|~150ns|~11ns|

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

*Screened by:* Alex Miller

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Attachment 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch [ 14920 ]
Alex Miller made changes -
Approval Incomplete [ 10006 ] Screened [ 10004 ]
Description Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Timings:*
||coll||1.7.0-RC2||1.7.0-RC2+patch||
|big record|~940ns|~10ns|
|small record|~150ns|~11ns|

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

*Screened by:* Alex Miller

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Timings:*

||coll||1.8.0-master||1.7.0-master+patch||
|small record|99 ns|9 ns|
|big record|455 ns|9 ns|

{code}
(defrecord R [a b]) ;; small
(def r (R. (range 1e3) (range 1e3)))
(bench (hash r))
(defrecord R [a b c d e f g h i j]) ;; big
(def r (map->R (zipmap [:a :b :c :d :e :f :g :h :i :j] (repeat (range 1e3)))))
(bench (hash r))
{code}

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

*Screened by:* Alex Miller

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Alex Miller made changes -
Description Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Timings:*

||coll||1.8.0-master||1.7.0-master+patch||
|small record|99 ns|9 ns|
|big record|455 ns|9 ns|

{code}
(defrecord R [a b]) ;; small
(def r (R. (range 1e3) (range 1e3)))
(bench (hash r))
(defrecord R [a b c d e f g h i j]) ;; big
(def r (map->R (zipmap [:a :b :c :d :e :f :g :h :i :j] (repeat (range 1e3)))))
(bench (hash r))
{code}

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

*Screened by:* Alex Miller

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

*Approach:* Cache hash values in record definitions, similar to maps.

*Timings:*

||coll||1.8.0-master||1.8.0-master+patch||
|small record|99 ns|9 ns|
|big record|455 ns|9 ns|

{code}
(defrecord R [a b]) ;; small
(def r (R. (range 1e3) (range 1e3)))
(bench (hash r))
(defrecord R [a b c d e f g h i j]) ;; big
(def r (map->R (zipmap [:a :b :c :d :e :f :g :h :i :j] (repeat (range 1e3)))))
(bench (hash r))
{code}

*Patch:* 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records-v2.patch

*Screened by:* Alex Miller

*Also see:* http://dev.clojure.org/jira/browse/CLJS-281
Rich Hickey made changes -
Issue Type Defect [ 1 ] Enhancement [ 4 ]

People

Vote (17)
Watch (12)

Dates

  • Created:
    Updated: