Clojure

Transient hashmaps mishandle hash collisions

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: Release 1.3
  • Fix Version/s: Release 1.4
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

Clojure 1.2.1 and 1.3.0-beta1 both exhibit the following behavior:

user> (let [m (into {} (for [x (range 100000)] [(rand) (rand)]))]
(println (count (distinct (map hash (keys m)))))
((juxt count identity) (persistent!
(reduce dissoc! (transient m) (keys m)))))
99999
[2 {0.42548900739367024 0.8725039567983159}]

We create a large transient map with random keys and values, and check to see how many unique hashcodes we get. Then, we iterate over all the keys, dissoc'ing each out of the transient map. The resulting map has one element in it (wrong - it should be empty, since we dissoc'ed all the keys), and reports its count as being two (wrong - not sure whether it should be zero or one given the other breakage). As far as I can tell, each duplicated hash value is represented once in the output map, and the map's count is the number of keys that hashed to something duplicated.

The problem seems to be restricted to transients, as if we remove the transient/persistent! pair and use dissoc instead of dissoc!, the map is always empty.

Inspired by discussion at http://groups.google.com/group/clojure/browse_thread/thread/313ac122667bb4b5/c3e7faa8635403f1

  1. clj-829.diff
    08/Sep/11 1:15 PM
    2 kB
    Christophe Grand
  2. clj-829-2.diff
    10/Sep/11 3:48 AM
    2 kB
    Christophe Grand

Activity

People

Vote (0)
Watch (3)

Dates

  • Created:
    Updated:
    Resolved: