Clojure

Broken equality for sets

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: Release 1.4, Release 1.5
  • Fix Version/s: Release 1.5
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

Equality for sets is not consistent with other persistent collections when the hashCode differs:

(= {(Integer. -1) :foo} {(Long. -1) :foo})
=> true

(= [(Integer. -1)] [(Long. -1)])
=> true

(= #{(Integer. -1)} #{(Long. -1)})
=> false

Attached is what I believe to be a basic fix. It removes a hashCode check from APersistentSet.equals. A similar check was removed from APersistentMap in the following commit:

https://github.com/clojure/clojure/commit/b5f5ba2e15dc2f20e14e05141f7de7c6a3d91179

With this patch, set equality works as expected and all tests pass.

My understanding of the implications of this change are limited, so someone more knowledgeable would need to chime in about its correctness.

Activity

Hide
Gary Fredericks added a comment -

Created CLJ-1262.

Show
Gary Fredericks added a comment - Created CLJ-1262.
Hide
Gary Fredericks added a comment -

This is apparently still present wrt BigInteger:

(let [n1 -5, n2 (BigInteger. "-5")]
  [(= n1 n2) (= #{n1} #{n2}) (= [n1] [n2])])
;; => [true false true]

Should this be a new ticket?

Show
Gary Fredericks added a comment - This is apparently still present wrt BigInteger:
(let [n1 -5, n2 (BigInteger. "-5")]
  [(= n1 n2) (= #{n1} #{n2}) (= [n1] [n2])])
;; => [true false true]
Should this be a new ticket?
Hide
Aaron Bedra added a comment -

Any chance this will make 1.5?

Show
Aaron Bedra added a comment - Any chance this will make 1.5?
Hide
Stuart Halloway added a comment -

Notes from discussion with Rich:

1. Aaron's 2/2/13 patch, which mimics map logic into set, is the preferred approach.

2. The broader issue I was worried about is the semantic collision between Map's containsKey and Associative's containsKey. This is out of scope here, and perhaps ever.

Show
Stuart Halloway added a comment - Notes from discussion with Rich: 1. Aaron's 2/2/13 patch, which mimics map logic into set, is the preferred approach. 2. The broader issue I was worried about is the semantic collision between Map's containsKey and Associative's containsKey. This is out of scope here, and perhaps ever.
Hide
Stuart Halloway added a comment -

The Set equality code currently in Clojure uses .hashCode to quickly bail out of comparisons in both .equals and .equiv. This feels wrong since .equals and .equiv presume different hashes. The map equality code avoids the problem by not checking any hash.

Aaron's 2/13 patch makes set equality work like map equality, not checking the hash. (I think a much shorter patch that accomplishes the same thing merely drops the call to hash.)

It seems to me a separate problem that both hash and set are broken for Java .equals rules, because of the equiv-based handling of their keys.

I think this needs more consideration.

Show
Stuart Halloway added a comment - The Set equality code currently in Clojure uses .hashCode to quickly bail out of comparisons in both .equals and .equiv. This feels wrong since .equals and .equiv presume different hashes. The map equality code avoids the problem by not checking any hash. Aaron's 2/13 patch makes set equality work like map equality, not checking the hash. (I think a much shorter patch that accomplishes the same thing merely drops the call to hash.) It seems to me a separate problem that both hash and set are broken for Java .equals rules, because of the equiv-based handling of their keys. I think this needs more consideration.
Hide
Aaron Bedra added a comment -

This patch addresses the difference between equals and equiv.

Show
Aaron Bedra added a comment - This patch addresses the difference between equals and equiv.
Hide
Aaron Bedra added a comment -

Both of the patches above are not complete. APersistentSet equiv calls equals. APersistentMap has two separate ways of handling this. This is the path that the fix should take.

Show
Aaron Bedra added a comment - Both of the patches above are not complete. APersistentSet equiv calls equals. APersistentMap has two separate ways of handling this. This is the path that the fix should take.
Hide
Aaron Bedra added a comment -

Bo, I don't see you listed on the contributors list. Did you send in a CA?

Show
Aaron Bedra added a comment - Bo, I don't see you listed on the contributors list. Did you send in a CA?
Hide
Bo Jeanes added a comment -

On further thought and comparison to APersistentMap, I'm not sure if that's the best use of Util.hasheq(). I can't find good reference on the semantic differences and am not familiar enough with the Clojure source to infer it, yet.

Show
Bo Jeanes added a comment - On further thought and comparison to APersistentMap, I'm not sure if that's the best use of Util.hasheq(). I can't find good reference on the semantic differences and am not familiar enough with the Clojure source to infer it, yet.
Hide
Bo Jeanes added a comment -

Attaching a patch with an alternate fix for this issue that does not skip hashCode checking.

It passes all existing tests and fixes this issue.

I want to benchmark the difference, too.

Show
Bo Jeanes added a comment - Attaching a patch with an alternate fix for this issue that does not skip hashCode checking. It passes all existing tests and fixes this issue. I want to benchmark the difference, too.
Hide
Bo Jeanes added a comment -

Instead of removing m.hashCode() != hashCode(), perhaps we should use `Util.hasheq()` instead. It already seems to special case numbers (https://github.com/clojure/clojure/blob/f48d024e97/src/jvm/clojure/lang/Util.java#L164-L172) and won't have the potential performance impact that skipping hashCode checks could bring.

Show
Bo Jeanes added a comment - Instead of removing m.hashCode() != hashCode(), perhaps we should use `Util.hasheq()` instead. It already seems to special case numbers (https://github.com/clojure/clojure/blob/f48d024e97/src/jvm/clojure/lang/Util.java#L164-L172) and won't have the potential performance impact that skipping hashCode checks could bring.
Hide
Paul Stadig added a comment -

I applied the patch on master (ca465d6d) and it applied cleanly and fixes the issue.

Show
Paul Stadig added a comment - I applied the patch on master (ca465d6d) and it applied cleanly and fixes the issue.
Hide
Aaron Bedra added a comment -

I'm going to take a look and try to get this shipped. It hit us and I would love to to see it in 1.5

Show
Aaron Bedra added a comment - I'm going to take a look and try to get this shipped. It hit us and I would love to to see it in 1.5
Hide
Gary Fredericks added a comment -

This came up when we were trying to diff two datasets by putting them in sets and comparing. We had Integers because they came back from JDBC that way.

Show
Gary Fredericks added a comment - This came up when we were trying to diff two datasets by putting them in sets and comparing. We had Integers because they came back from JDBC that way.
Hide
Timothy Baldridge added a comment -

vetting

Show
Timothy Baldridge added a comment - vetting
Hide
Justin Kramer added a comment -

Revised patch with unit test that fails in master and succeeds with patch

Show
Justin Kramer added a comment - Revised patch with unit test that fails in master and succeeds with patch
Hide
Andy Fingerhut added a comment -

Can you add some unit tests that fail with the current code but succeed with the change?

Show
Andy Fingerhut added a comment - Can you add some unit tests that fail with the current code but succeed with the change?

People

Vote (4)
Watch (6)

Dates

  • Created:
    Updated:
    Resolved: