<< Back to previous view

[CLJ-426] case should handle hash collision Created: 13/Aug/10  Updated: 29/Apr/11  Resolved: 29/Apr/11

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.3

Type: Defect Priority: Major
Reporter: Assembla Importer Assignee: Stuart Halloway
Resolution: Completed Votes: 1
Labels: None

Attachments: Text File 426.patch    
Patch: Code and Test
Approval: Ok
Waiting On: Stuart Halloway

 Description   

should generate a branch under the hash to deal with the conflicting values



 Comments   
Comment by Assembla Importer [ 29/Sep/10 9:30 AM ]

Converted from http://www.assembla.com/spaces/clojure/tickets/426

Comment by Assembla Importer [ 29/Sep/10 9:30 AM ]

stu said: Updating tickets (#427, #426, #421, #420, #397)

Comment by Assembla Importer [ 29/Sep/10 9:30 AM ]

cemerick said: It would be nice if case were further enhanced to support references to enums and static final fields (and maybe other stably-referenceable values as well?). A userland workaround is shown here (added here instead of in a separate ticket at Rich's request).

Comment by Rich Hickey [ 07/Jan/11 7:40 AM ]

while in there we should look at special-casing all-int case, and primitive return from case

Comment by Aaron Bedra [ 09/Jan/11 2:38 PM ]

I'm gonna give this a shot. If anyone has any solutions to this please feel free to jump in here while I am working on it.

Comment by Alexander Taggart [ 28/Feb/11 12:59 PM ]

case changes:

  • handles hash collisions
  • can emit return type
  • conditionally supports java enums
  • performance path for all-int or all-char test constants
Comment by Alexander Taggart [ 01/Mar/11 12:02 AM ]

Patch happens to fix CLJ-438 as well.

Comment by Alexander Taggart [ 02/Mar/11 5:05 AM ]

Updated patch so that case is no longer one gigantic ball of code. And fixed some bugs

Comment by Alexander Taggart [ 02/Mar/11 5:28 AM ]

Update to fix a bug.

Comment by Alexander Taggart [ 02/Mar/11 7:55 PM ]

Per cemerick's request I'm providing a secondary patch which would provide for evaluating test-constant symbols at compile time. This would allow us to use switches where writing a literal value is difficult. E.g.:

user=> (defn foo [x]
         (case x
           java.lang.Math/PI :pi
           java.lang.Math/E :e
           :oops))
#'user/foo
user=> (foo java.lang.Math/E)
:e

Quoted symbols are not evaluated, but there are backwards compatibility concerns with evaluating unquoted symbols:

  • if evaluation is successful, it might be unintended thus breaking existing code
  • if evaluation is unsucessful
    • continuing to treat it as a symbol would preserve partial backwards compatibility
    • throwing an error would break compatability, but allow for a more consistent behavior going forward, i.e., quote symbols when you want symbols
Comment by Stuart Halloway [ 04/Mar/11 2:52 PM ]

What's the basis for deciding when to use tableswitch vs. lookupswitch? I tried the sparse example [-100, 0, 100] from http://java.sun.com/docs/books/jvms/second_edition/html/Compiling.doc.html and the code produced a tableswitch. Is this desirable?

Comment by Stuart Halloway [ 04/Mar/11 2:54 PM ]

Rich: screened in this case means only that the tests pass. I am continuing to dig into this patch, adding comments as I go. Let me know if there are specific things I should be investigating.

Comment by Stuart Halloway [ 04/Mar/11 2:59 PM ]

enum support feels half-baked. For example, an attempt to mix enums and non enums compiles but then cannot find the enum:

(case clojure.lang.Compiler$CaseTestEnumABC/a
        clojure.lang.Compiler$CaseTestEnumABC/a 1
        1 2)
IllegalArgumentException No matching clause: a
Comment by Stuart Halloway [ 04/Mar/11 3:08 PM ]

The use of eval to detect enums feels icky, and counter to the stated behavior of allowing only compile-time constants.

Comment by Stuart Halloway [ 04/Mar/11 3:12 PM ]

Rich: to the extent that we believe in testing for things like reflection warnings, it would be nice to have a way to get to that information other than be scraping stderr. If this is of interest let me know and I will make a ticket and/or start a design discussion as appropriate.

Comment by Alexander Taggart [ 04/Mar/11 4:45 PM ]

Regarding the mixed enum/non-enum case:

The primary patch treats symbols which do not resolve to enums as symbols, but also treats enums as symbols when the set of test-constants was not all-enum. Both parts were an attempt at backwards compatibility.

I agree that the better behaviour would be to error on the mixed case, though this would break any code where a test-constant symbol happens to resolve to an enum.

Regarding eval:

It doesn't get around the "compile-time" restriction, but rather the "literal" restriction.

Comment by Rich Hickey [ 04/Mar/11 5:03 PM ]

Unless someone can provide some simple semantics for enum support, I'd rather not support them.

Re: lookupswitch - I don't see any reason to support it - is the tableswitch code in some way wrong?

Comment by Alexander Taggart [ 04/Mar/11 7:39 PM ]

simple semantics for enum support

As the updated doc states: "Java enums are supported if all test constants are enums of the same type."

Stu's example violates the above rule. The issue then becomes what should be the response: error at compile-time, or treat as symbol. More on this in a following comment.

Re: lookupswitch ... is the tableswitch code in some way wrong?

Not at all, but the current version will throw an exception when it can't find a sufficient shift-mask:

user=> (case 1
         1     :a
         16384 :b
         16385 :c)
IllegalArgumentException No distinct mapping found  clojure.core/min-hash (core.clj:5708)

The lookupswitch bytecode is only used to substitute for throwing that exception.

The current process of fitting test constants into a switch:

  1. Coerce to int by hashing
  2. if shift-mask found, use tableswitch and shift-mask
  3. else throw exception

The process used in the patch:

  1. Coerce to int (hash, enum ordinal, or int value)
  2. if (max-int - min-int) < max-tableswitch-size, use tableswitch without shift-mask
  3. else if shift-mask found, use tableswitch with shift-mask
  4. else use lookupswitch without shift-mask

Where max-tableswitch-size is simply an explicit consequence of the mask width.

Comment by Rich Hickey [ 04/Mar/11 7:59 PM ]

Java enums aren't Clojure constants.

Comment by Alexander Taggart [ 04/Mar/11 9:50 PM ]

Support for enums is really just a degenerate case of supporting named constant values, e.g., java.lang.Math/PI, (def init-state 0).

The basic requirement of case is that the test constants have a known value at compile-time, and that value has consistent hashing. The current implementation assumes the latter requirement will be met by restricting test constants to literal types. This mostly works, but fails with things like regex patterns.

The above basic requirements can be met while removing the incidental "literal" restriction. To do so means resolving literal symbols into constant values at compile-time, and then validating that the types are suitable (e.g., number, string).

The simplest and most consistent approach would be to require quoting test constant symbols when used as symbols. Setting aside all other discussion of enums, etc., is this change to the treatment of symbols acceptable?

Comment by Alexander Taggart [ 04/Mar/11 10:39 PM ]

To make life easier, I'm altering the set of patches.

426-basic.patch:

  • handles hash collisions
  • can emit return type
  • performance path for all-int or all-char test constants
  • no documentation change

426-supports-named-values.patch:

  • all from above, plus...
  • evaluates unquoted test constant symbols at compile-time
  • validates test constants are suitable for case
  • doc change: "The test-constants can be any combination of numbers, chars, strings, keywords, quoted symbols, and (Clojure) composites thereof. Unquoted symbols will be evaluated at compile time, and must resolve to one of the preceding types."

426-supports-enums-and-named-values.patch:

  • all from above, plus...
  • conditionally supports java enums
  • doc change: "Java enums are supported only if all test constants are enums of the same type."
Comment by Chas Emerick [ 05/Mar/11 7:33 AM ]

RH:

Java enums aren't Clojure constants.

SH:

The use of eval to detect enums feels icky, and counter to the stated behavior of allowing only compile-time constants.

I assume both of you are actually talking about literals. Limiting case to literal test values is simply too much of a handicap, even ignoring interop uses. One should not have to macro their way out of repeating a constant value (e.g. (def *mole* 6.02214179e23)) throughout a codebase that needs to use case extensively.

Leaving aside questions about mechanics, static final fields and enums are absolutely constants, and not being able to use them within case would be (and is today) a significant pain point. Yes, their value w.r.t. case can change after a case form has been AOT-compiled – which exactly mirrors switch (a good thing, IMO, making case a proper superset of switch). Again, something one can macro out of, but something I don't think people should have to macro out of.

Comment by Rich Hickey [ 05/Mar/11 7:35 AM ]

Thanks for splitting this up. I'm not in favor of anything beyond basic. The enum stuff is a snake pit full of new semantics, contradictions and special cases. I had told Chas as much. Sorry I hadn't realized the enum thing had crept into this scope sooner.

Rich

Comment by Chas Emerick [ 05/Mar/11 7:36 AM ]

I should say that I'd consider it perfectly reasonable if case were kept limited to using literals (if that limitation is considered important enough to retain at some level), but clojure.core provided a switch macro that evaluated symbols as Alexander has been reaching for.

Comment by Rich Hickey [ 05/Mar/11 8:04 AM ]

Chas, I'm not sure what you mean by 'macro their way out of..." - are you saying def is too much work? Or is what you'd want, but doesn't work?

What's missing is not some extension to case, but possibly a broadening of what constitutes a compile time constant. At no point should that involve compile time evaluation in case. A defconst has long been requested and is a fine idea. Ditto static finals as constants. As long as case is defined to work with constants, we can enhance 'constants' and case can do more transparently (at least semantically transparently). All of this stuff around unquoted constant symbols, conditional compile time evaluation (when does that otherwise happen?) etc is a mess. Note that case is currently defined in terms of literals because Clojure doesn't have a strong notion of constant yet.

Also, I think you are not adequately considering the use of case in symbolic computation. People use case matching of symbols all the time when writing macros and compilers etc in CL, and wouldn't appreciate having to quote everything. Also, given we can match aggregates, how might one match (quote x), and why wouldn't that match 'x?

This is scope creep, and insufficiently considered, plain and simple. Let's leave it as a separate ticket. It's going to require more thought, in any case

Comment by Chas Emerick [ 05/Mar/11 1:07 PM ]

Well, everything is scope creep in the end.

By "macro their way out", I meant doing something like I did here to use non-literal constants in case forms. When I wrote that comment this morning, I was under the impression that only Clojure literals were considered acceptable.


Anyway, what follows is either incredibly pedantic, or a draft of a wiki page where a more agreeable solution can be hashed out.

It occurs to me that being explicit about what constants exist and what's on the table here would be helpful, at least for me:

  1. Clojure literals and composites thereof, as implemented in 1.2.0 case
  2. static final fields
  3. def'ed values, presumed to be unchanging
  4. enums

(1) is granted, and it sounds like moving beyond this is not desired at this time without significant further analysis/discussion (implying the application of Alexander's "basic" patch only).

(2) appears to be desired ("Ditto static finals as constants"). However, as soon as we allow for non-literal constant values (whether fields, def's, enums, or other), I'm very confused as to:

  • how what we use to refer to those values (surely symbols?) aren't turned into a second-class test values
  • how we're not faced with conditional resolution of those values (there's no calls anywhere, so it seems there's little difference between eval and direct usage of Reflector, at least for fields and enums)

Perhaps the defconst variation hinted at in conjunction with (3) would somehow avoid compile-time evaluation for def'ed values? I'm no longer familiar with CL and the defconstant prior art to reasonably guess at the relevant semantics of defconst.

Re: (4) and "The enum stuff is a snake pit full of new semantics, contradictions and special cases": I'm pretty sure I didn't know that. It seems like enum support is as close as:

(.ordinal (Reflector/getStaticField EnumClassName "EnumName"))

…with all reasonable error-checking and such to ensure the domain of the case form includes only enums of the same type (though again, the issues with resolving symbols remains).


Anyway, I'm sorry to have encouraged (egged on?) Alexander. I appear to have misread/misunderstood the desired scope of changes.

Comment by Alexander Taggart [ 05/Mar/11 2:04 PM ]

Chas, no worries. Being able to use named constant values as test constants makes my life so much easier, especially when dealing with java interop. Until now I've had to use (condp = x ...). Regardless of whether it makes it into core, I will still be using it.

Comment by Rich Hickey [ 06/Mar/11 11:05 AM ]

Chas -

1) Is all we've got right now. Literals compile into themselves. There is no notion of constant in the language.

2) Treating finals as constants would require the compiler compile MyClass/MY_FINAL into its value, which it does not currently do. It currently generates a getStaticField opcode.

3) I never suggested treating defs like constants. A defconst would require some notion of constant. Since its value would be compiled into code, it must be different from def, the semantics of which are to compile into a deref of the var. Only the latter currently occurs. Doing otherwise automagically just inside case would be bizarre. Languages typically constrain the initializers of such constants, esp. to constant expressions themselves. CL allows for arbitrary initializers, but requires they be eval-able at compile time, always evaluate to the same value, and with much hand waving about execution order.

e.g. if as you had suggested:

(def x (launch-missiles))

(case foo x ...)

Would missiles be launched during compilation or loading or both? Would x be nil or the number of missiles launched in the case? What if the case were wrapped in (binding/let [x ...] ...)?

4) is like 2.

You are correct, 2,3,4 all conflict with the use of symbols in case, which is why supporting them is a breaking change, glommed on to a ticket that nominally is a bug fix, augmented by a perf optimization.

Fixing the conflict isn't as simple as saying "use 'x for symbols" in case. You'd need to consider the interaction of constants and all special forms and macros, the relationship between constants and '[un]evaluated' positions everywhere. It's not a trivial thing. CL, FWIW, does not in fact support its own defconstants as keys in its case, considering them constants only in evaluated positions.

I don't have the time right now to consider all of that, but I won't accept a design that hasn't.

Comment by Stuart Halloway [ 06/Mar/11 12:03 PM ]

Reviewing the basic patch. The code passed to case* is incorrect for some duplicate hash scenarios. One example:

(let [f #(case % 0 :ZERO -1 :NEG1 2 :TWO :OOPS :OOPS)]
    (doseq [input [0 -1 2 :OOPS]]
      (println (format "(f %s) = %s" input (f input)))))
(f 0) = :ZERO
(f -1) = :NEG1
(f 2) = [2 :TWO]
IllegalArgumentException No matching clause: :OOPS

It would be a huge help to me in screening if the docstrings for the private helper fns documented expected inputs and outputs.

Comment by Alexander Taggart [ 06/Mar/11 6:25 PM ]

Hash-collision handling function did not properly handle non-colliding test/then entries.

426-basic-update-1.patch fixes the above and adds Stu's example to the test suite.

Comment by Alexander Taggart [ 06/Mar/11 6:26 PM ]

426-basic-update-1.patch also contains additional documentation to the various helper functions.

Comment by Stuart Halloway [ 11/Mar/11 11:43 AM ]

Alex: There is something weird about characters:

(case (char \uFFFF) \uFFFF 1)
=> 1
(case \uFFFF \uFFFF 1)
=> IllegalArgumentException No matching clause: ?  user/eval2253 (NO_SOURCE_FILE:126)
(case \a \a 1)
=> IllegalArgumentException No matching clause: a  user/eval2256 (NO_SOURCE_FILE:127)
(case (char \a) \a 1)
=> 1

(These all return "1" prior to the patch.)

Comment by Alexander Taggart [ 11/Mar/11 2:51 PM ]

Good catch, Stu.

Causes:

  • character literals are analyzed as ConstantExprs which do not emit primitives, thus the case code went into the "hash" branch.
  • character test values (not the case values) were erroneously ints, so the post-switch validation looked like (= \a 97).
    • this also meant there was equivalence between numbers and chars, which is not consistent with the rest of clojure.

Fix:

426-basic-update-2.patch: removes support for the all-chars case

Aside:

Would it be reasonable to have a CharExpr analogue to NumberExpr?

Comment by Stuart Halloway [ 20/Mar/11 9:31 AM ]

some inputs (which formerly causes "no distinct mapping found" can now cause negative array access or exhaust memory

;; negative array size exception
(defn foo [x]
  (case x
        -1993159583054299157 -1993159583054299157
        -4 -4
        0 0
        -1028157349872421032 0
        :nwuhzgv1k4Gl8eyE*+4UT0b1wIlN!ohzv3!snZ-dN6TBWZ7aOpCYjk3cwgKUHDenjBkx*dwIudNqXVHPDSyuB8yE!d1dSDDGGi5_v5FwC+S_Pr+?hXmEdiL2_3ND+_UCVY4IH8bUw 0
        2 2))


;; out of memory
(defn foo [x]
  (case x
        -1993159583054299157 -1993159583054299157
        -4 -4
        0 0
        :nwuhzgv1k4Gl8eyE*+4UT0b1wIlN!ohzv3!snZ-dN6TBWZ7aOpCYjk3cwgKUHDenjBkx*dwIudNqXVHPDSyuB8yE!d1dSDDGGi5_v5FwC+S_Pr+?hXmEdiL2_3ND+_UCVY4IH8bUw 0
        2 2))
Comment by Alexander Taggart [ 20/Mar/11 1:44 PM ]

Both of the above were due to label array creation not taking into account the type of switch being emitted. Fixed in 426-basic-update-3.patch.

Comment by Alexander Taggart [ 21/Mar/11 4:38 PM ]

426-basic-update-4.patch update to apply to current master branch.

Comment by Stuart Halloway [ 05/Apr/11 8:21 PM ]

update-4 fails for this input:

(defn foo
  [x]
  (case x
        -1 -1))

Maybe an issue with case statements that are all negative ints?

Comment by Alexander Taggart [ 06/Apr/11 3:41 AM ]

Update 5 fixes the above bug.

In the previous patch, if all the test expressions were ints but the tested expression was not known to be primitive, the hashcode branch was used. The hashcode of Integers is equal to their value, but for Long's that's only true for the positive range. This was the source of the bug you found, namely the value in the switch (-1) didn't match the hash of Long -1 (0).

Instead of reusing the hashcode branch, bytecode like the following is now emitted:

if (expr instanceof Number)
  switch(((Number)expr).intValue()){
    case -1:
      if (Util.equiv(expr, -1))
        return -2;
  }
// default
throw new IllegalArgumentException();

Note that the performance warning is still in place, albeit with a better message:
"Performance warning, %s:%d - case has int tests, but tested expression is not primitive.\n"

Comment by Christopher Redinger [ 15/Apr/11 12:55 PM ]

Please Test patch

Comment by Christopher Redinger [ 21/Apr/11 12:17 PM ]

FYI: patch applies cleanly & tests pass against master as of 4/21 (2011)

Comment by Stuart Halloway [ 22/Apr/11 11:48 AM ]

I am seeing failure for the following input:

(defn foo
 [x]
 (case x
   1204766517646190306 9
   1 8 
   -2 6))

Sorry for the screwy formatting in JIRA.

Comment by Alexander Taggart [ 22/Apr/11 2:14 PM ]

Updated patch to fix above.

Explanation of bug:

When multiple test constants have a hash collision, the respective thens are combined with a condp, and the colliding hash value is used as the test constant. Since the combined then performs its own post-switch validation, the normal (non-colliding) behaviour of checking that the original test value equals the tested expression must be skipped.

The bug occurred because the skip-check set of case ints was not getting the necessary shift-mask applied to it.

The bug was not caught earlier because there was incomplete test coverage, namely, for the case of both hash collision and shift-mask application.

Comment by Stuart Halloway [ 26/Apr/11 8:01 PM ]

code now passes test.generative tests. Test run shows case to be faster than cond (and test code could be improved to more directly compare speed).

(binding [*msec* 100000]
  (doseq [v (test-namespaces 'clojure.test.core-test)]
    @v))

{:iterations 2204, :msec 33339, :spec #'clojure.test.core-test/cond-spec, :seed 43}
{:iterations 2159, :msec 33353, :spec #'clojure.test.core-test/cond-spec, :seed 42}
{:iterations 3298, :msec 33339, :spec #'clojure.test.core-test/case-spec, :seed 43}
nil
{:iterations 3217, :msec 33338, :spec #'clojure.test.core-test/case-spec, :seed 42}

This is a fairly large patch – please advise if we need to find a way to break into into smaller chunks, or if there are other perf tests you would like to see.

Generated at Sun Dec 21 04:07:01 CST 2014 using JIRA 4.4#649-r158309.