Error formatting macro: pagetree: java.lang.NullPointerException
Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

The expectation is that this will become a JIRA ticket in the future, after the details are more fleshed out.

See this thread on the Clojure Google group: https://groups.google.com/forum/?fromgroups=#!topic/clojure/AG667ACBd3I

Note especially Chas Emerick's detailed analysis of how we arrived at the current state, posted Aug 5, 2012.  Also Mark Engelberg's argumentation on Sep 4, 2012 in favor of reverting to the older pre-exception-throwing behavior, all of which should now be duplicated below.

 

Problems

  • sorted-set duplicate handling behavior differs from hash-set
    • this is an inarguable bug
  • set literals throw on duplicate keys
    • when arguably there should be no problem, since conflict free
    • this behavior is just an artifact of sharing implementation with map
  • hash-set throws on duplicate keys
    • same reasons

Current behavior of Clojure 1.4.0

;; Sets
user=> #{28 28}
IllegalArgumentException Duplicate key: 28  clojure.lang.PersistentHashSet.createWithCheck (PersistentHashSet.java:68)

;; It is when set literals contain variables that are unexpectedly equal
;; that some do not want an exception thrown.
user=> (def a 28)
#'user/a
user=> (def b 28)
#'user/b
user=> #{a b}
IllegalArgumentException Duplicate key: 28  clojure.lang.PersistentHashSet.createWithCheck (PersistentHashSet.java:68)

;; This is one way to construct a set that allows duplicates.
user=> (set [a b])
#{28}


;; Maps

;; Similar to sets, except that only keys must be distinct.
;; However, in this case the construction functions array-map
;; and hash-map also disallow duplicate keys, whereas
;; sorted-map permits them.

user=> {a 5 b 7}
IllegalArgumentException Duplicate key: 28  clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:70)
user=> (array-map a 5 b 7)
IllegalArgumentException Duplicate key: 28  clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:70)
user=> (hash-map a 5 b 7)
IllegalArgumentException Duplicate key: 28  clojure.lang.PersistentHashMap.createWithCheck (PersistentHashMap.java:92)
user=> (sorted-map a 5 b 7)
{28 7}

;; assoc is one way to create a map that silently eliminates duplicate keys
user=> (assoc {} a 5 b 7)
{28 7}

 

Arguments for changing it back to never throwing exceptions on duplicates

1. "It's a bug that should be fixed."  The change to throw-on-duplicate behavior for sets in 1.3 was a breaking change that causes a runtime error in previously working, legitimate code.

Looking through the history of the issue, one can see that no one was directly asking for throw-on-duplicate behavior.  The underlying problem was that array-maps with duplicate keys returned nonsensical objects; surely it would be more user-friendly to just block people from creating such nonsense by throwing an error.  This logic was extended to other types of maps and sets.

It's not entirely clear the degree to which the consequences of these changes were considered, but it seems likely that there was an implicit assumption that throw-on-duplicate behavior would only come into play in programs with some sort of syntactic error, when in fact it has semantic implications for working programs.  When a new "feature" causes unintentional breakage in working code, this is arguably a bug and needs to be reconsidered.  

2. "The current way of doing things is internally inconsistent and therefore complex."

(def a 1)

(def b 1)

(set [a b]) -> good

(hash-set a b) -> error

#{a b} -> error

(sorted-set a b) -> good

(into #{} a b) -> good

The cognitive load from having to remember which constructors do what is a bad thing.  

3. "Current behavior conflicts with the mathematical and intuitive notion of a set."

In math, {1, 1} = {1}.  In programming, sets are used as a means to eliminate duplicates.

Arguments for leaving things as is

Now let's summarize the arguments that have been raised here in support of the status quo.

1. "Changing everything to throw-on-duplicate would be just as logically consistent as changing everything to use-last-in."

True, but that doesn't mean that both approaches would be equally useful.  It's readily apparent that an essential idea of sets is that they need to be able to gracefully absorb duplicates, so at least one such method of doing that is essential.  On the other hand, we can get along just fine without sets throwing errors in the event of a duplicate value.  So if you're looking for consistency, there's really only one practical option.

2.  "I like the idea that Clojure will protect me from accidentally from this kind of syntax error."

Clojure, as a dynamically typed language, is unable to protect you from the vast majority of data-entry syntax errors you're likely to make.

Let's say you want to type in {:apple 1, :banana 2}.  Even if Clojure can catch your mistake if you type {:apple 1, :apple 2}, there's no way it's ever going to catch you if you type {:apple 1, :banano 2}, and frankly, the latter error is one you're far more likely to make.

This is precisely why there's little evidence that anyone was asking for this kind of syntax error protection, and little evidence that anyone has benefited significantly from its addition -- its real-world utility is fairly minimal and dwarfed by the other kinds of errors one is likely to make.

3.  "Maybe we can do it both ways."

It's laudable to want to make everyone happy.  The danger, of course, is that such sentiment paints a picture that it would be a massive amount of work to please everyone, and therefore, we should do nothing.  Let's be practical about what is easily doable here with the greatest net benefit.  The current system has awkward and inconsistent semantics with little benefit.  Let's focus on fixing it. The easiest patch -- revert to 1.2 behavior, but bring array-map's semantics into alignment with the other associative collections.

Labels: