clojure.set should check or throw on non-set inputs

Description

clojure.set/union is very sensitive to the types of its inputs. It does not attempt to check or fix the input types, raise an error, or even document this behavior.

If all inputs are sets, it works.

If the arguments are both vectors or sequences, it returns the same type with duplicates.

If the arguments are mixed, the correct result is returned only if the longest input argument is a set.

Environment

Not Relevant

Activity

Show:

Andy Fingerhut July 31, 2018 at 1:07 AM

If you want an off-the-shelf compatible replacement for clojure.set functions that are identical in behavior, except they perform run-time type checks of the arguments you provide to them, and throw an exception if they have the wrong types (e.g. not sets for union, intersection, difference, subset?, and superset?), consider using the fungible library: https://github.com/jafingerhut/funjible

Andy Fingerhut June 9, 2016 at 5:07 PM

I am sympathetic to your desires, Ashton, but have no new arguments that might convince those who decide what changes are made to Clojure that it would be a good enough idea to do so.

I would point out an answer to one of your comments: "It isn't even documented that this function expects sets." It seems to me from past comments that the point of view of the Clojure core team is that this is documented, e.g. "Return a set that is the union of the input sets" tells you what clojure.set/union does when you give it sets as arguments. It specifies nothing about what it does when you give it non-set arguments, so it is free to do anything at all in those cases, including what it currently does.

import June 9, 2016 at 3:52 PM

Comment made by: ashtonkemerling

I do not see set/union being covered in the tickets you mentioned.

Furthermore, this issue differs from the intersection bugs in a few ways important ways:

  1. It silently returns data that is the wrong type, and which contains the wrong values.

  2. It never raises an exception.

But it does share the following bugs with the intersection problem:

  1. This behavior is not only type dependent, but data dependent. It will happen to work depending on the lengths of the given sets.

  2. It isn't even documented that this function expects sets.

  3. It runs directly contrary to the definition of the mathematical function it purports to represent.

I only caught this bug in my own code because I hand inspected the result. I had just assumed that set/union would do the right thing, and was deeply surprised when against both definition and documentation it did not.

Alex Miller June 9, 2016 at 3:40 PM

This has been raised a number of times. See CLJ-1682, CLJ-810.

Details

Assignee

Reporter

Labels

Priority

Affects versions

Created June 9, 2016 at 3:31 PM
Updated July 31, 2018 at 1:07 AM