clojure.data/diff uses set union on key seqs

Description

clojure.data/diff, on line 118, defines:

java.util.Map (diff-similar [a b] (diff-associative a b (set/union (keys a) (keys b))))

Since keys returns a key seq, this seems like an error. clojure.set/union has strange and inconsistent behavior with regard to non-sets, and in this case the two key seqs are concatenated. Based on a cursory benchmark, it seems that this bug is a slight performance gain when the maps have no common keys, and a significant performance loss when the maps have the same keys. The results are still correct because of the merging reduce in diff-associative.

The patch is easy (just call set on each key seq).

Environment

None

Attachments

1

Activity

Show:

Andy Fingerhut June 5, 2015 at 8:03 AM

I would suggest that perhaps bigger than the issue of performance here is that Clojure is relying on its own undocumented behavior that clojure.set/union works on some non-set arguments.

Andy Fingerhut October 15, 2012 at 8:52 PM

clj-1087-diff-perf-enhance-patch-v1.txt dated Oct 15 2012 implements Tom's suggested performance enhancement, although not exactly in the way he suggested. It does calculate the union of the two key sequences.

Details

Assignee

Reporter

Patch

Code

Priority

Created October 15, 2012 at 9:59 AM
Updated June 5, 2015 at 8:03 AM