[CLJ-1087] clojure.data/diff uses set union on key seqs Created: 15/Oct/12  Updated: 05/Jun/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Tom Jack Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: checkargs, performance

Attachments: Text File clj-1087-diff-perf-enhance-patch-v1.txt    
Patch: Code


clojure.data/diff, on line 118, defines:

(diff-similar [a b]
  (diff-associative a b (set/union (keys a) (keys b))))

Since keys returns a key seq, this seems like an error. clojure.set/union has strange and inconsistent behavior with regard to non-sets, and in this case the two key seqs are concatenated. Based on a cursory benchmark, it seems that this bug is a slight performance gain when the maps have no common keys, and a significant performance loss when the maps have the same keys. The results are still correct because of the merging reduce in diff-associative.

The patch is easy (just call set on each key seq).

Comment by Andy Fingerhut [ 15/Oct/12 2:52 PM ]

clj-1087-diff-perf-enhance-patch-v1.txt dated Oct 15 2012 implements Tom's suggested performance enhancement, although not exactly in the way he suggested. It does calculate the union of the two key sequences.

Comment by Andy Fingerhut [ 05/Jun/15 2:03 AM ]

I would suggest that perhaps bigger than the issue of performance here is that Clojure is relying on its own undocumented behavior that clojure.set/union works on some non-set arguments.

Generated at Tue May 21 13:41:11 CDT 2019 using JIRA 4.4#649-r158309.