Invalid calls to clojure.set functions return an incorrect answer rather than error
Description
Environment
Activity
Jan Rychter February 15, 2019 at 9:20 PM
I'll attempt a first (minimal) step here. Since set functions have docstrings that begin with "Return a set", one can reasonably expect that they will always return a set.
The specific case that I encountered was `(clojure.set/difference nil #{1})` returning nil. The reason why the nil appeared was because of optionality: a set operation was performed and the source data was optional in the map. Data passed spec validations (because the key was optional), and only later tripped other validations because of the nil returned by set/difference. I realize that the arguments are incorrect and I do not necessarily expect auto-coercion.
What I would expect is a post-condition error that would show up in code compiled with assertions.
Specifically, I think a `{ost [(set? %)]}` post-condition on set functions is something that can be agreed upon. It doesn't address coercion, and it avoids the discussion of what are valid arguments, it just places a restriction on the return value according to documentation.
Given what the docstrings say, I do not think there is code out there that relies on set functions returning anything else than a set.
The performance impact of a post-condition would only be present if compiled with assertions.
I realize this doesn't address all issues mentioned, but it's perhaps a way to start moving forward.
I do not think this is a "Major" issue as currently marked.
Stuart Halloway November 18, 2018 at 1:40 PM
A spec test could compare inputs and outputs and not break any existing code.
Michiel Borkent November 17, 2018 at 12:52 PM
Interesting data from a repo by Andy Fingerhut comparing the performance of:
versions of set functions with pre-conditions (no coercion)
instrumented set functions
The functions with pre-conditions are significantly faster than the instrumented ones, but not much slower than the originals.
https://github.com/jafingerhut/funjible#wait-isnt-this-what-clojurespec-is-for
Alex Miller November 16, 2018 at 7:34 PM
Before being considered, this ticket needs:
a good problem statement
an assessment of where existing code calls these functions with something other than sets
a table of alternatives and their tradeoffs. presumably alternatives are: add specs, add validation checks, add coercions, etc. Tradeoffs may include effects on existing callers (known or unknown), performance, etc.
Decisions to make:
Are existing calls (with inputs that are not sets) valid or invalid?
What change should be made in the functions
Patches are probably not super useful until those things are decided.
Details
Assignee
UnassignedUnassignedReporter
Erik AssumErik AssumApproval
TriagedPriority
Major
Details
Details
Assignee
Reporter
Approval
Priority

There are a number of tickets concerned with the fact that the set functions in
clojure.set
misbehave when passed arguments that are not sets.This set of issues include CLJ-810, CLJ-1087, CLJ-1682, and CLJ-1954
The functions affected by this are:
difference
intersection
union
subset?
superset?
as these are known to produce unexpected results when passed non-set arguments.
Problem
As the above mentioned issues suggest, todays implementation of these functions leads to confusion and erroneous results when called with non-set input. The user is given no warning or indications of the error he's making.
Possible solutions
Add a coercion to
set
on the arguments said functionsThrow an exception when the arguments are not sets
Handle this with clojure.spec
Leave it as is
Tradeoffs
Given CLJ-2362, which makes a call to
set
close to a noop, the coercion should not incur much of a performance penalty. It has been argued that the code might even be faster, as type hints can be given and the compiler/jit might make better choices. For the common mistakes (passing vectors/lists instead of sets) it should be backwards compatibleThrowing an exception on non-set arguments would break programs which work correctly today (although by chance), such as data.diff.
Handling it with clojure.spec seems like a viable option, but again, this would break data.diff if the functions were spec'ed to both receive and return sets.
Leaving it as it is, and we will continue to surprise both new and old clojurists.
Evidence of this being a problem
The tickets mentioned above seem to indicate that people stumble upon this often enough to file issues
https://clojuredocs.org/clojure.set/superset_q#example-5b5acd38e4b00ac801ed9e39