<< Back to previous view

[CLJ-1106] Broken equality for sets Created: 09/Nov/12  Updated: 13/Sep/13  Resolved: 08/Feb/13

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4, Release 1.5
Fix Version/s: Release 1.5

Type: Defect Priority: Major
Reporter: Justin Kramer Assignee: Aaron Bedra
Resolution: Completed Votes: 4
Labels: None

Attachments: Text File 0001-CLJ-1106-Use-Util.hasheq-in-hashCode-for-persistent-.patch     Text File 0001-Fixing-set-equality.patch     File setequals.diff    
Patch: Code and Test
Approval: Ok

 Description   

Equality for sets is not consistent with other persistent collections when the hashCode differs:

(= {(Integer. -1) :foo} {(Long. -1) :foo})
=> true

(= [(Integer. -1)] [(Long. -1)])
=> true

(= #{(Integer. -1)} #{(Long. -1)})
=> false

Attached is what I believe to be a basic fix. It removes a hashCode check from APersistentSet.equals. A similar check was removed from APersistentMap in the following commit:

https://github.com/clojure/clojure/commit/b5f5ba2e15dc2f20e14e05141f7de7c6a3d91179

With this patch, set equality works as expected and all tests pass.

My understanding of the implications of this change are limited, so someone more knowledgeable would need to chime in about its correctness.



 Comments   
Comment by Andy Fingerhut [ 09/Nov/12 9:48 AM ]

Can you add some unit tests that fail with the current code but succeed with the change?

Comment by Justin Kramer [ 09/Nov/12 10:24 AM ]

Revised patch with unit test that fails in master and succeeds with patch

Comment by Timothy Baldridge [ 03/Dec/12 9:04 AM ]

vetting

Comment by Gary Fredericks [ 29/Jan/13 5:32 PM ]

This came up when we were trying to diff two datasets by putting them in sets and comparing. We had Integers because they came back from JDBC that way.

Comment by Aaron Bedra [ 29/Jan/13 5:35 PM ]

I'm going to take a look and try to get this shipped. It hit us and I would love to to see it in 1.5

Comment by Paul Stadig [ 29/Jan/13 7:14 PM ]

I applied the patch on master (ca465d6d) and it applied cleanly and fixes the issue.

Comment by Bo Jeanes [ 30/Jan/13 11:19 AM ]

Instead of removing m.hashCode() != hashCode(), perhaps we should use `Util.hasheq()` instead. It already seems to special case numbers (https://github.com/clojure/clojure/blob/f48d024e97/src/jvm/clojure/lang/Util.java#L164-L172) and won't have the potential performance impact that skipping hashCode checks could bring.

Comment by Bo Jeanes [ 30/Jan/13 11:39 AM ]

Attaching a patch with an alternate fix for this issue that does not skip hashCode checking.

It passes all existing tests and fixes this issue.

I want to benchmark the difference, too.

Comment by Bo Jeanes [ 30/Jan/13 1:32 PM ]

On further thought and comparison to APersistentMap, I'm not sure if that's the best use of Util.hasheq(). I can't find good reference on the semantic differences and am not familiar enough with the Clojure source to infer it, yet.

Comment by Aaron Bedra [ 02/Feb/13 12:33 PM ]

Bo, I don't see you listed on the contributors list. Did you send in a CA?

Comment by Aaron Bedra [ 02/Feb/13 1:36 PM ]

Both of the patches above are not complete. APersistentSet equiv calls equals. APersistentMap has two separate ways of handling this. This is the path that the fix should take.

Comment by Aaron Bedra [ 02/Feb/13 2:37 PM ]

This patch addresses the difference between equals and equiv.

Comment by Stuart Halloway [ 04/Feb/13 9:55 PM ]

The Set equality code currently in Clojure uses .hashCode to quickly bail out of comparisons in both .equals and .equiv. This feels wrong since .equals and .equiv presume different hashes. The map equality code avoids the problem by not checking any hash.

Aaron's 2/13 patch makes set equality work like map equality, not checking the hash. (I think a much shorter patch that accomplishes the same thing merely drops the call to hash.)

It seems to me a separate problem that both hash and set are broken for Java .equals rules, because of the equiv-based handling of their keys.

I think this needs more consideration.

Comment by Stuart Halloway [ 05/Feb/13 8:57 AM ]

Notes from discussion with Rich:

1. Aaron's 2/2/13 patch, which mimics map logic into set, is the preferred approach.

2. The broader issue I was worried about is the semantic collision between Map's containsKey and Associative's containsKey. This is out of scope here, and perhaps ever.

Comment by Aaron Bedra [ 05/Feb/13 9:38 AM ]

Any chance this will make 1.5?

Comment by Gary Fredericks [ 13/Sep/13 9:42 AM ]

This is apparently still present wrt BigInteger:

(let [n1 -5, n2 (BigInteger. "-5")]
  [(= n1 n2) (= #{n1} #{n2}) (= [n1] [n2])])
;; => [true false true]

Should this be a new ticket?

Comment by Gary Fredericks [ 13/Sep/13 9:48 AM ]

Created CLJ-1262.

Generated at Fri Jul 25 11:10:31 CDT 2014 using JIRA 4.4#649-r158309.