Clojure

Fallback to hash-based comparison for non-Comparables in Util.compare()

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Declined
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Patch:
    Code

Description

I oftentimes use sorted collections, and my comparator functions usually look like that:

(fn [a b]
  (let [x (compare-according-to-my-use-case a b)]
    (if (zero? x)
       (compare a b)
       x)))

That is, I define the sorting order depending on the requirements of my use-case, and if that doesn't suffice to make a distinction, I simply fall back to the standard `compare` function in order to get a stable ordering anyhow. I think that's a widely used idiom, e.g., also used in "Clojure Programming" (for example page 109).

The problem with this approach is that several data structures don't implement Comparable, and thus aren't applicable to `compare` (you'll get a ClassCastException). Examples are maps, records, deftypes, and sequences.

The patch I'm going to attach extends `Util.compare()` with a fallback clause that performs a `hasheq()`-based comparison. This results in a meaningless but at least stable sorting order which suffices for use-cases like the one shown above.

Activity

Tassilo Horn made changes -
Field Original Value New Value
Attachment 0001-Add-hasheq-based-fallback-to-Util.compare.patch [ 11961 ]
Tassilo Horn made changes -
Labels enhancement enhancement patch
Tassilo Horn made changes -
Patch Code [ 10001 ]
Tassilo Horn made changes -
Description I oftentimes use sorted collections, and my comparator functions usually look like that:

{noformat}
(fn [a b]
  (let [x (compare-according-to-my-use-case a b)]
    (if (zero? x)
       (compare a b)
       x)))
{noformat}

That is, I define the sorting order depending on the requirements of my use-case, and if that doesn't suffice to make a distinction, I simply fall back to the standard `compare` function in order to get a stable ordering anyhow. I think that's a widely used idiom, e.g., also used in "Clojure Programming" (for example page 108).

The problem with this approach is that several data structures don't implement Comparable, and thus aren't applicable to `compare` (you'll get a ClassCastException). Examples are maps, records, deftypes, and sequences.

The patch I'm going to attach extends `Util.compare()` with a fallback clause that performs a `hasheq()`-based comparison. This results in a meaningless but at least stable sorting order which suffices for use-cases like the one shown above.
I oftentimes use sorted collections, and my comparator functions usually look like that:

{noformat}
(fn [a b]
  (let [x (compare-according-to-my-use-case a b)]
    (if (zero? x)
       (compare a b)
       x)))
{noformat}

That is, I define the sorting order depending on the requirements of my use-case, and if that doesn't suffice to make a distinction, I simply fall back to the standard `compare` function in order to get a stable ordering anyhow. I think that's a widely used idiom, e.g., also used in "Clojure Programming" (for example page 109).

The problem with this approach is that several data structures don't implement Comparable, and thus aren't applicable to `compare` (you'll get a ClassCastException). Examples are maps, records, deftypes, and sequences.

The patch I'm going to attach extends `Util.compare()` with a fallback clause that performs a `hasheq()`-based comparison. This results in a meaningless but at least stable sorting order which suffices for use-cases like the one shown above.
Hide
Andy Fingerhut added a comment -

Patch 0001-Add-hasheq-based-fallback-to-Util.compare.patch dated Apr 17 2013 applies cleanly to latest master, but causes several unit tests in data_structures.clj to fail. These are the kinds of things you would expect to fail with the change made in the patch, because the failing tests expect an exception to be thrown when comparing objects that don't implement Comparable, and with this patch's changes they no longer do. If this patch's change is desired, those tests should probably be removed.

Show
Andy Fingerhut added a comment - Patch 0001-Add-hasheq-based-fallback-to-Util.compare.patch dated Apr 17 2013 applies cleanly to latest master, but causes several unit tests in data_structures.clj to fail. These are the kinds of things you would expect to fail with the change made in the patch, because the failing tests expect an exception to be thrown when comparing objects that don't implement Comparable, and with this patch's changes they no longer do. If this patch's change is desired, those tests should probably be removed.
Hide
Tassilo Horn added a comment -

Thanks Andy, I can't believe I've forgotten to re-run the tests.

Anyway, I'm attaching a new version of the patch that deletes the few sorted-set and sorted-set-by invocations that check that an exception is thrown when creating sorted sets containing non-Comparables.

Show
Tassilo Horn added a comment - Thanks Andy, I can't believe I've forgotten to re-run the tests. Anyway, I'm attaching a new version of the patch that deletes the few sorted-set and sorted-set-by invocations that check that an exception is thrown when creating sorted sets containing non-Comparables.
Hide
Tassilo Horn added a comment -

New version of the patch.

Show
Tassilo Horn added a comment - New version of the patch.
Tassilo Horn made changes -
Tassilo Horn made changes -
Attachment 0001-Add-hasheq-based-fallback-to-Util.compare.patch [ 11961 ]
Hide
Andy Fingerhut added a comment -

Tassilo, you say that one of your use cases is sorted collections. Note that any compare function that returns 0 for two values will cause sorted sets and maps to treat the two compared values as equal, and at most one of them will get into the ordered set/map, the second treated as a duplicate, even though the values are not equal. See https://github.com/jafingerhut/thalia/blob/master/doc/other-topics/comparators.md#comparators-for-sorted-sets-and-maps-are-easy-to-get-wrong for an example (not based on your modified compare, but on a comparator that returns 0 for non-equal items).

With your proposed change to compare, this occurrence would become very dependent upon the hash function used. Probably quite rare, but it would crop up from time to time, and could potentially be very difficult to detect and track down the source.

Show
Andy Fingerhut added a comment - Tassilo, you say that one of your use cases is sorted collections. Note that any compare function that returns 0 for two values will cause sorted sets and maps to treat the two compared values as equal, and at most one of them will get into the ordered set/map, the second treated as a duplicate, even though the values are not equal. See https://github.com/jafingerhut/thalia/blob/master/doc/other-topics/comparators.md#comparators-for-sorted-sets-and-maps-are-easy-to-get-wrong for an example (not based on your modified compare, but on a comparator that returns 0 for non-equal items). With your proposed change to compare, this occurrence would become very dependent upon the hash function used. Probably quite rare, but it would crop up from time to time, and could potentially be very difficult to detect and track down the source.
Hide
Tassilo Horn added a comment -

Hi Andy, you are right. It's possible to add an explicit equals()-check in cases the hashcodes are the same, but I think there's nothing you can do in the hashcodes-equal-but-objects-are-not case. That is, there's no way to ensure the rule sgn(compare(x, y)) == -sgn(compare(y, x)), the transitivity rule, and the compare(x, y)==0 ==> sgn(compare(x, z))==sgn(compare(y, z)) for all z rule.

I'm closing that ticket.

Show
Tassilo Horn added a comment - Hi Andy, you are right. It's possible to add an explicit equals()-check in cases the hashcodes are the same, but I think there's nothing you can do in the hashcodes-equal-but-objects-are-not case. That is, there's no way to ensure the rule sgn(compare(x, y)) == -sgn(compare(y, x)), the transitivity rule, and the compare(x, y)==0 ==> sgn(compare(x, z))==sgn(compare(y, z)) for all z rule. I'm closing that ticket.
Tassilo Horn made changes -
Resolution Declined [ 2 ]
Status Open [ 1 ] Closed [ 6 ]

People

Vote (0)
Watch (3)

Dates

  • Created:
    Updated:
    Resolved: