[CLJ-1203] Fallback to hash-based comparison for non-Comparables in Util.compare() Created: 17/Apr/13 Updated: 29/Apr/13 Resolved: 29/Apr/13
I oftentimes use sorted collections, and my comparator functions usually look like that:
That is, I define the sorting order depending on the requirements of my use-case, and if that doesn't suffice to make a distinction, I simply fall back to the standard `compare` function in order to get a stable ordering anyhow. I think that's a widely used idiom, e.g., also used in "Clojure Programming" (for example page 109).
The problem with this approach is that several data structures don't implement Comparable, and thus aren't applicable to `compare` (you'll get a ClassCastException). Examples are maps, records, deftypes, and sequences.
The patch I'm going to attach extends `Util.compare()` with a fallback clause that performs a `hasheq()`-based comparison. This results in a meaningless but at least stable sorting order which suffices for use-cases like the one shown above.
|Comment by Andy Fingerhut [ 18/Apr/13 9:01 PM ]|
Patch 0001-Add-hasheq-based-fallback-to-Util.compare.patch dated Apr 17 2013 applies cleanly to latest master, but causes several unit tests in data_structures.clj to fail. These are the kinds of things you would expect to fail with the change made in the patch, because the failing tests expect an exception to be thrown when comparing objects that don't implement Comparable, and with this patch's changes they no longer do. If this patch's change is desired, those tests should probably be removed.
|Comment by Tassilo Horn [ 19/Apr/13 2:34 AM ]|
Thanks Andy, I can't believe I've forgotten to re-run the tests.
Anyway, I'm attaching a new version of the patch that deletes the few sorted-set and sorted-set-by invocations that check that an exception is thrown when creating sorted sets containing non-Comparables.
|Comment by Tassilo Horn [ 19/Apr/13 2:35 AM ]|
New version of the patch.
|Comment by Andy Fingerhut [ 26/Apr/13 2:47 PM ]|
Tassilo, you say that one of your use cases is sorted collections. Note that any compare function that returns 0 for two values will cause sorted sets and maps to treat the two compared values as equal, and at most one of them will get into the ordered set/map, the second treated as a duplicate, even though the values are not equal. See https://github.com/jafingerhut/thalia/blob/master/doc/other-topics/comparators.md#comparators-for-sorted-sets-and-maps-are-easy-to-get-wrong for an example (not based on your modified compare, but on a comparator that returns 0 for non-equal items).
With your proposed change to compare, this occurrence would become very dependent upon the hash function used. Probably quite rare, but it would crop up from time to time, and could potentially be very difficult to detect and track down the source.
|Comment by Tassilo Horn [ 29/Apr/13 2:29 AM ]|
Hi Andy, you are right. It's possible to add an explicit equals()-check in cases the hashcodes are the same, but I think there's nothing you can do in the hashcodes-equal-but-objects-are-not case. That is, there's no way to ensure the rule sgn(compare(x, y)) == -sgn(compare(y, x)), the transitivity rule, and the compare(x, y)==0 ==> sgn(compare(x, z))==sgn(compare(y, z)) for all z rule.
I'm closing that ticket.