Quick Search
Browse
Pages
Blog
Labels
Attachments
Mail
Advanced
What’s New
Space Directory
Feed Builder
Keyboard Shortcuts
Confluence Gadgets
Log In
Sign Up
Dashboard
Clojure Design
Copy Page
You are not logged in. Any changes you make will be marked as
anonymous
. You may want to
Log In
if you already have an account. You can also
Sign Up
for a new account.
This page is being edited by
.
Paragraph
Paragraph
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Preformatted
Quote
Bold
Italic
Underline
Colour
More colours
Strikethrough
Subscript
Superscript
Monospace
Clear Formatting
Bullet list
Numbered list
Outdent
Indent
Align left
Align center
Align right
Link
Table
Insert
Insert Content
Image
Link
Attachment
Symbol
Emoticon
Wiki Markup
Horizontal rule
tinymce.confluence.insert_menu.macro_desc
Info
JIRA Issue
Status
Gallery
Tasklist
Table of Contents
Other Macros
Undo
Redo
Keyboard Shortcuts Help
<p>JIRA ticket <a href="http://dev.clojure.org/jira/browse/CLJ-1065">CLJ-1065</a> has been created for this.</p><p>See this thread on the Clojure Google group: <a href="https://groups.google.com/forum/?fromgroups=#!topic/clojure/AG667ACBd3I">https://groups.google.com/forum/?fromgroups=#!topic/clojure/AG667ACBd3I</a></p><p>Note especially Chas Emerick's detailed analysis of how we arrived at the current state, posted Aug 5, 2012. Also Mark Engelberg's argumentation on Sep 4, 2012 in favor of reverting to the older pre-exception-throwing behavior, all of which should now be duplicated below.</p><p> </p><hr /><h3><em>RH Feedback Zone</em></h3><h3>Bugs</h3><ul><li>sorted-set duplicate handling behavior differs from hash-set (which throws)</li><li>sorted-map duplicate handling behavior differs from hash-map (which throws)</li></ul><h3>Set 'Problems'</h3><ul><li>set literals throw on duplicate keys<ul><li>Is it a user error?<ul><li>open question</li><li>is there a purpose to writing #{42 42}?<ul><li>must every reader deal with that?</li></ul></li><li>if yes, then checking for dupes might be penalizing correct programs (perf-wise)<ul><li>not checking means maybe creating an invalid object</li></ul></li><li>if hash-set didn't throw you would have an alternative when unknown entries</li></ul></li><li>arguably there should be no problem, since conflict free<ul><li>yet, a user seeing<ul><li>#{a b}</li></ul></li><li>should expect a set with 2 entries</li></ul></li><li>this behavior is just an artifact of sharing implementation with map</li></ul></li><li>hash-set throws on duplicate keys<ul><li>same reasons</li></ul></li></ul><div><h3 style="margin-left: 0.0px;">Map 'Problems'</h3><ul style="margin-left: 0.0px;"><li style="margin-left: 0.0px;">map literals throw on duplicate keys<ul style="margin-left: 0.0px;"><li style="margin-left: 0.0px;">this is <strong>not</strong> conflict-free, as values for same key might differ</li><li style="margin-left: 0.0px;">is it a user error?<ul><li style="margin-left: 0.0px;"><strong>Yes</strong><ul><li style="margin-left: 0.0px;">This is an inarguably, and apparently, bad map:<ul><li style="margin-left: 0.0px;">{:a 1 :b 2 :a 3}</li></ul></li><li>Using keys not known to be unique in a literal is bad form<ul><li>a user seeing this:<ul><li>{a 1 b 2 c 3}</li></ul></li><li>should expect a map with 3 entries</li></ul></li></ul></li><li>if hash-map did not throw, you would have an alternative when keys unknown</li></ul></li><li style="margin-left: 0.0px;">auto-resolving implies an order-of-consideration for map literal entries<ul><li style="margin-left: 0.0px;">and <em><strong>there should not be one</strong></em></li><li style="margin-left: 0.0px;">a complete semantic mess</li></ul></li><li>non-resolving alternatives:<ul><li>throw<ul><li>checking for dupes might be penalizing correct programs</li></ul></li><li>not checking means maybe creating an invalid object</li></ul></li></ul></li><li style="margin-left: 0.0px;">hash-map throws on duplicate keys<ul style="margin-left: 0.0px;"><li style="margin-left: 0.0px;">here there might be an implicit order due to argument order</li><li style="margin-left: 0.0px;">could make repeated assoc promise</li><ul><li style="margin-left: 0.0px;">that's the behavior of sorted-map</li></ul></ul></li></ul><h3>Opinions</h3><div><ul><li>I don't think there is any merit whatsoever to supporting duplicates, evident or not, in literal sets and maps<ul><li>such programs are at worst broken and at best anti-social</li><li>so, what should happen?<ul><li>there are read-time and runtime considerations</li></ul></li><li>first step - declare such things are user errors</li><li>second step - decide on a reporting strategy</li></ul></li><li>Don't penalize correct programs!<ul><li>unchecked array-based map constructors are a critical way to competitive perf for object-like use of maps</li></ul></li><li>I think hash-set and hash-map should not throw on dupes<ul><li>and that hash/sorted-set/map should make an explicit as-if-by-repeated assoc promise</li></ul></li><li>If you think a month is too long to get a response to your needs, from a bunch of very busy volunteers, you need to chill out<ul><li>just because you decided to bring it up doesn't mean everyone else needs to drop what they are doing</li></ul></li><li>This page was useful, thanks.</li></ul><h3>Recommendations</h3><div><ol><li>hash/sorted-set/map should make an explicit as-if-by-repeated-conj/assoc promise<ol><li>thus will never throw, and be consistent</li><li>if you don't know that you have unique keys, use these!</li></ol></li><li>Document "Duplicate keys in map/set literals, evident or not, are user errors"<ol><li>saying that is not the same as guaranteeing they will generate exceptions!</li><li>generate exceptions for now</li><li>eventually move the (non-reader) check into debug mode, or otherwise provide runtime control</li></ol></li><li>Restore the fastest path possible for those cases where the keys are compile-time detectable unique constants<ol><li>high perf for known correct programs at least</li></ol></li></ol></div></div></div><h3 style="margin-left: 0.0px;"><em>end RH Feedback Zone</em></h3><hr /><p><em><br /></em></p><h1>Current behavior of Clojure 1.4.0</h1><pre>;; Sets<br />user=> #{28 28}<br />IllegalArgumentException Duplicate key: 28 clojure.lang.PersistentHashSet.createWithCheck (PersistentHashSet.java:68)<br /><br />;; It is when set literals contain variables that are unexpectedly equal<br />;; that some do not want an exception thrown.<br />user=> (def a 28)<br />#'user/a<br />user=> (def b 28)<br />#'user/b<br />user=> #{a b}<br />IllegalArgumentException Duplicate key: 28 clojure.lang.PersistentHashSet.createWithCheck (PersistentHashSet.java:68)<br /><br />;; This is one way to construct a set that allows duplicates.<br />user=> (set [a b])<br />#{28}<br /><br /><br />;; Maps<br /><br />;; Similar to sets, except that only keys must be distinct.<br />;; However, in this case the construction functions array-map<br />;; and hash-map also disallow duplicate keys, whereas<br />;; sorted-map permits them.<br /><br />user=> {a 5 b 7}<br />IllegalArgumentException Duplicate key: 28 clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:70)<br />user=> (array-map a 5 b 7)<br />IllegalArgumentException Duplicate key: 28 clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:70)<br />user=> (hash-map a 5 b 7)<br />IllegalArgumentException Duplicate key: 28 clojure.lang.PersistentHashMap.createWithCheck (PersistentHashMap.java:92)<br />user=> (sorted-map a 5 b 7)<br />{28 7}<br /><br />;; assoc is one way to create a map that silently eliminates duplicate keys<br />user=> (assoc {} a 5 b 7)<br />{28 7}</pre><p> </p><h1>Arguments for changing it back to never throwing exceptions on duplicates</h1><p>1. "It's a bug that should be fixed." The change to throw-on-duplicate behavior for sets in 1.3 was a breaking change that causes a runtime error in previously working, legitimate code.</p><p>Looking through the history of the issue, one can see that no one was directly asking for throw-on-duplicate behavior. The underlying problem was that array-maps with duplicate keys returned nonsensical objects; surely it would be more user-friendly to just block people from creating such nonsense by throwing an error. This logic was extended to other types of maps and sets.</p><p>It's not entirely clear the degree to which the consequences of these changes were considered, but it seems likely that there was an implicit assumption that throw-on-duplicate behavior would only come into play in programs with some sort of syntactic error, when in fact it has semantic implications for working programs. When a new "feature" causes unintentional breakage in working code, this is arguably a bug and needs to be reconsidered. </p><p>2. "The current way of doing things is internally inconsistent and therefore complex."</p><p>(def a 1)</p><p>(def b 1)</p><p>(set [a b]) -> good</p><p>(hash-set a b) -> error</p><p>#{a b} -> error</p><p>(sorted-set a b) -> good</p><p>(into #{} a b) -> good</p><p>The cognitive load from having to remember which constructors do what is a bad thing. </p><p>3. "Current behavior conflicts with the mathematical and intuitive notion of a set."</p><p>In math, {1, 1} = {1}. In programming, sets are used as a means to eliminate duplicates.</p><h1>Arguments for leaving things as is</h1><p>Now let's summarize the arguments that have been raised here in support of the status quo.</p><p>1. "Changing everything to throw-on-duplicate would be just as logically consistent as changing everything to use-last-in."</p><p>True, but that doesn't mean that both approaches would be equally useful. It's readily apparent that an essential idea of sets is that they need to be able to gracefully absorb duplicates, so at least one such method of doing that is essential. On the other hand, we can get along just fine without sets throwing errors in the event of a duplicate value. So if you're looking for consistency, there's really only one practical option.</p><p>2. "I like the idea that Clojure will protect me from accidentally from this kind of syntax error."</p><p>Clojure, as a dynamically typed language, is unable to protect you from the vast majority of data-entry syntax errors you're likely to make.</p><p>Let's say you want to type in {:apple 1, :banana 2}. Even if Clojure can catch your mistake if you type {:apple 1, :apple 2}, there's no way it's ever going to catch you if you type {:apple 1, :banano 2}, and frankly, the latter error is one you're far more likely to make.</p><p>This is precisely why there's little evidence that anyone was asking for this kind of syntax error protection, and little evidence that anyone has benefited significantly from its addition -- its real-world utility is fairly minimal and dwarfed by the other kinds of errors one is likely to make.</p><p>3. "Maybe we can do it both ways."</p><p>It's laudable to want to make everyone happy. The danger, of course, is that such sentiment paints a picture that it would be a massive amount of work to please everyone, and therefore, we should do nothing. Let's be practical about what is easily doable here with the greatest net benefit. The current system has awkward and inconsistent semantics with little benefit. Let's focus on fixing it. The easiest patch -- revert to 1.2 behavior, but bring array-map's semantics into alignment with the other associative collections.</p>
Attachments
Labels
Location
< Edit
Preview >
Loading…
Save
Cancel
Next hint
search
attachments
weblink
advanced