Clojure

case should handle hash collision

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: Release 1.3
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

should generate a branch under the hash to deal with the conflicting values

Activity

Hide
Stuart Halloway added a comment -

code now passes test.generative tests. Test run shows case to be faster than cond (and test code could be improved to more directly compare speed).

(binding [*msec* 100000]
  (doseq [v (test-namespaces 'clojure.test.core-test)]
    @v))

{:iterations 2204, :msec 33339, :spec #'clojure.test.core-test/cond-spec, :seed 43}
{:iterations 2159, :msec 33353, :spec #'clojure.test.core-test/cond-spec, :seed 42}
{:iterations 3298, :msec 33339, :spec #'clojure.test.core-test/case-spec, :seed 43}
nil
{:iterations 3217, :msec 33338, :spec #'clojure.test.core-test/case-spec, :seed 42}

This is a fairly large patch – please advise if we need to find a way to break into into smaller chunks, or if there are other perf tests you would like to see.

Show
Stuart Halloway added a comment - code now passes test.generative tests. Test run shows case to be faster than cond (and test code could be improved to more directly compare speed).
(binding [*msec* 100000]
  (doseq [v (test-namespaces 'clojure.test.core-test)]
    @v))

{:iterations 2204, :msec 33339, :spec #'clojure.test.core-test/cond-spec, :seed 43}
{:iterations 2159, :msec 33353, :spec #'clojure.test.core-test/cond-spec, :seed 42}
{:iterations 3298, :msec 33339, :spec #'clojure.test.core-test/case-spec, :seed 43}
nil
{:iterations 3217, :msec 33338, :spec #'clojure.test.core-test/case-spec, :seed 42}
This is a fairly large patch – please advise if we need to find a way to break into into smaller chunks, or if there are other perf tests you would like to see.
Hide
Alexander Taggart added a comment -

Updated patch to fix above.

Explanation of bug:

When multiple test constants have a hash collision, the respective thens are combined with a condp, and the colliding hash value is used as the test constant. Since the combined then performs its own post-switch validation, the normal (non-colliding) behaviour of checking that the original test value equals the tested expression must be skipped.

The bug occurred because the skip-check set of case ints was not getting the necessary shift-mask applied to it.

The bug was not caught earlier because there was incomplete test coverage, namely, for the case of both hash collision and shift-mask application.

Show
Alexander Taggart added a comment - Updated patch to fix above. Explanation of bug: When multiple test constants have a hash collision, the respective thens are combined with a condp, and the colliding hash value is used as the test constant. Since the combined then performs its own post-switch validation, the normal (non-colliding) behaviour of checking that the original test value equals the tested expression must be skipped. The bug occurred because the skip-check set of case ints was not getting the necessary shift-mask applied to it. The bug was not caught earlier because there was incomplete test coverage, namely, for the case of both hash collision and shift-mask application.
Hide
Stuart Halloway added a comment -

I am seeing failure for the following input:

(defn foo
 [x]
 (case x
   1204766517646190306 9
   1 8 
   -2 6))

Sorry for the screwy formatting in JIRA.

Show
Stuart Halloway added a comment - I am seeing failure for the following input:
(defn foo
 [x]
 (case x
   1204766517646190306 9
   1 8 
   -2 6))
Sorry for the screwy formatting in JIRA.
Hide
Christopher Redinger added a comment -

FYI: patch applies cleanly & tests pass against master as of 4/21 (2011)

Show
Christopher Redinger added a comment - FYI: patch applies cleanly & tests pass against master as of 4/21 (2011)
Hide
Christopher Redinger added a comment -

Please Test patch

Show
Christopher Redinger added a comment - Please Test patch
Hide
Alexander Taggart added a comment - - edited

Update 5 fixes the above bug.

In the previous patch, if all the test expressions were ints but the tested expression was not known to be primitive, the hashcode branch was used. The hashcode of Integers is equal to their value, but for Long's that's only true for the positive range. This was the source of the bug you found, namely the value in the switch (-1) didn't match the hash of Long -1 (0).

Instead of reusing the hashcode branch, bytecode like the following is now emitted:

if (expr instanceof Number)
  switch(((Number)expr).intValue()){
    case -1:
      if (Util.equiv(expr, -1))
        return -2;
  }
// default
throw new IllegalArgumentException();

Note that the performance warning is still in place, albeit with a better message:
"Performance warning, %s:%d - case has int tests, but tested expression is not primitive.\n"

Show
Alexander Taggart added a comment - - edited Update 5 fixes the above bug. In the previous patch, if all the test expressions were ints but the tested expression was not known to be primitive, the hashcode branch was used. The hashcode of Integers is equal to their value, but for Long's that's only true for the positive range. This was the source of the bug you found, namely the value in the switch (-1) didn't match the hash of Long -1 (0). Instead of reusing the hashcode branch, bytecode like the following is now emitted:
if (expr instanceof Number)
  switch(((Number)expr).intValue()){
    case -1:
      if (Util.equiv(expr, -1))
        return -2;
  }
// default
throw new IllegalArgumentException();
Note that the performance warning is still in place, albeit with a better message: "Performance warning, %s:%d - case has int tests, but tested expression is not primitive.\n"
Hide
Stuart Halloway added a comment -

update-4 fails for this input:

(defn foo
  [x]
  (case x
        -1 -1))

Maybe an issue with case statements that are all negative ints?

Show
Stuart Halloway added a comment - update-4 fails for this input:
(defn foo
  [x]
  (case x
        -1 -1))
Maybe an issue with case statements that are all negative ints?
Hide
Alexander Taggart added a comment -

426-basic-update-4.patch update to apply to current master branch.

Show
Alexander Taggart added a comment - 426-basic-update-4.patch update to apply to current master branch.
Hide
Alexander Taggart added a comment -

Both of the above were due to label array creation not taking into account the type of switch being emitted. Fixed in 426-basic-update-3.patch.

Show
Alexander Taggart added a comment - Both of the above were due to label array creation not taking into account the type of switch being emitted. Fixed in 426-basic-update-3.patch.
Hide
Stuart Halloway added a comment -

some inputs (which formerly causes "no distinct mapping found" can now cause negative array access or exhaust memory

;; negative array size exception
(defn foo [x]
  (case x
        -1993159583054299157 -1993159583054299157
        -4 -4
        0 0
        -1028157349872421032 0
        :nwuhzgv1k4Gl8eyE*+4UT0b1wIlN!ohzv3!snZ-dN6TBWZ7aOpCYjk3cwgKUHDenjBkx*dwIudNqXVHPDSyuB8yE!d1dSDDGGi5_v5FwC+S_Pr+?hXmEdiL2_3ND+_UCVY4IH8bUw 0
        2 2))


;; out of memory
(defn foo [x]
  (case x
        -1993159583054299157 -1993159583054299157
        -4 -4
        0 0
        :nwuhzgv1k4Gl8eyE*+4UT0b1wIlN!ohzv3!snZ-dN6TBWZ7aOpCYjk3cwgKUHDenjBkx*dwIudNqXVHPDSyuB8yE!d1dSDDGGi5_v5FwC+S_Pr+?hXmEdiL2_3ND+_UCVY4IH8bUw 0
        2 2))
Show
Stuart Halloway added a comment - some inputs (which formerly causes "no distinct mapping found" can now cause negative array access or exhaust memory
;; negative array size exception
(defn foo [x]
  (case x
        -1993159583054299157 -1993159583054299157
        -4 -4
        0 0
        -1028157349872421032 0
        :nwuhzgv1k4Gl8eyE*+4UT0b1wIlN!ohzv3!snZ-dN6TBWZ7aOpCYjk3cwgKUHDenjBkx*dwIudNqXVHPDSyuB8yE!d1dSDDGGi5_v5FwC+S_Pr+?hXmEdiL2_3ND+_UCVY4IH8bUw 0
        2 2))


;; out of memory
(defn foo [x]
  (case x
        -1993159583054299157 -1993159583054299157
        -4 -4
        0 0
        :nwuhzgv1k4Gl8eyE*+4UT0b1wIlN!ohzv3!snZ-dN6TBWZ7aOpCYjk3cwgKUHDenjBkx*dwIudNqXVHPDSyuB8yE!d1dSDDGGi5_v5FwC+S_Pr+?hXmEdiL2_3ND+_UCVY4IH8bUw 0
        2 2))
Hide
Alexander Taggart added a comment -

Good catch, Stu.

Causes:

  • character literals are analyzed as ConstantExprs which do not emit primitives, thus the case code went into the "hash" branch.
  • character test values (not the case values) were erroneously ints, so the post-switch validation looked like (= \a 97).
    • this also meant there was equivalence between numbers and chars, which is not consistent with the rest of clojure.

Fix:

426-basic-update-2.patch: removes support for the all-chars case

Aside:

Would it be reasonable to have a CharExpr analogue to NumberExpr?

Show
Alexander Taggart added a comment - Good catch, Stu. Causes:
  • character literals are analyzed as ConstantExprs which do not emit primitives, thus the case code went into the "hash" branch.
  • character test values (not the case values) were erroneously ints, so the post-switch validation looked like (= \a 97).
    • this also meant there was equivalence between numbers and chars, which is not consistent with the rest of clojure.
Fix: 426-basic-update-2.patch: removes support for the all-chars case Aside: Would it be reasonable to have a CharExpr analogue to NumberExpr?
Hide
Stuart Halloway added a comment -

Alex: There is something weird about characters:

(case (char \uFFFF) \uFFFF 1)
=> 1
(case \uFFFF \uFFFF 1)
=> IllegalArgumentException No matching clause: ?  user/eval2253 (NO_SOURCE_FILE:126)
(case \a \a 1)
=> IllegalArgumentException No matching clause: a  user/eval2256 (NO_SOURCE_FILE:127)
(case (char \a) \a 1)
=> 1

(These all return "1" prior to the patch.)

Show
Stuart Halloway added a comment - Alex: There is something weird about characters:
(case (char \uFFFF) \uFFFF 1)
=> 1
(case \uFFFF \uFFFF 1)
=> IllegalArgumentException No matching clause: ?  user/eval2253 (NO_SOURCE_FILE:126)
(case \a \a 1)
=> IllegalArgumentException No matching clause: a  user/eval2256 (NO_SOURCE_FILE:127)
(case (char \a) \a 1)
=> 1
(These all return "1" prior to the patch.)
Hide
Alexander Taggart added a comment -

426-basic-update-1.patch also contains additional documentation to the various helper functions.

Show
Alexander Taggart added a comment - 426-basic-update-1.patch also contains additional documentation to the various helper functions.
Hide
Alexander Taggart added a comment -

Hash-collision handling function did not properly handle non-colliding test/then entries.

426-basic-update-1.patch fixes the above and adds Stu's example to the test suite.

Show
Alexander Taggart added a comment - Hash-collision handling function did not properly handle non-colliding test/then entries. 426-basic-update-1.patch fixes the above and adds Stu's example to the test suite.
Hide
Stuart Halloway added a comment -

Reviewing the basic patch. The code passed to case* is incorrect for some duplicate hash scenarios. One example:

(let [f #(case % 0 :ZERO -1 :NEG1 2 :TWO :OOPS :OOPS)]
    (doseq [input [0 -1 2 :OOPS]]
      (println (format "(f %s) = %s" input (f input)))))
(f 0) = :ZERO
(f -1) = :NEG1
(f 2) = [2 :TWO]
IllegalArgumentException No matching clause: :OOPS

It would be a huge help to me in screening if the docstrings for the private helper fns documented expected inputs and outputs.

Show
Stuart Halloway added a comment - Reviewing the basic patch. The code passed to case* is incorrect for some duplicate hash scenarios. One example:
(let [f #(case % 0 :ZERO -1 :NEG1 2 :TWO :OOPS :OOPS)]
    (doseq [input [0 -1 2 :OOPS]]
      (println (format "(f %s) = %s" input (f input)))))
(f 0) = :ZERO
(f -1) = :NEG1
(f 2) = [2 :TWO]
IllegalArgumentException No matching clause: :OOPS
It would be a huge help to me in screening if the docstrings for the private helper fns documented expected inputs and outputs.
Hide
Rich Hickey added a comment -

Chas -

1) Is all we've got right now. Literals compile into themselves. There is no notion of constant in the language.

2) Treating finals as constants would require the compiler compile MyClass/MY_FINAL into its value, which it does not currently do. It currently generates a getStaticField opcode.

3) I never suggested treating defs like constants. A defconst would require some notion of constant. Since its value would be compiled into code, it must be different from def, the semantics of which are to compile into a deref of the var. Only the latter currently occurs. Doing otherwise automagically just inside case would be bizarre. Languages typically constrain the initializers of such constants, esp. to constant expressions themselves. CL allows for arbitrary initializers, but requires they be eval-able at compile time, always evaluate to the same value, and with much hand waving about execution order.

e.g. if as you had suggested:

(def x (launch-missiles))

(case foo x ...)

Would missiles be launched during compilation or loading or both? Would x be nil or the number of missiles launched in the case? What if the case were wrapped in (binding/let [x ...] ...)?

4) is like 2.

You are correct, 2,3,4 all conflict with the use of symbols in case, which is why supporting them is a breaking change, glommed on to a ticket that nominally is a bug fix, augmented by a perf optimization.

Fixing the conflict isn't as simple as saying "use 'x for symbols" in case. You'd need to consider the interaction of constants and all special forms and macros, the relationship between constants and '[un]evaluated' positions everywhere. It's not a trivial thing. CL, FWIW, does not in fact support its own defconstants as keys in its case, considering them constants only in evaluated positions.

I don't have the time right now to consider all of that, but I won't accept a design that hasn't.

Show
Rich Hickey added a comment - Chas - 1) Is all we've got right now. Literals compile into themselves. There is no notion of constant in the language. 2) Treating finals as constants would require the compiler compile MyClass/MY_FINAL into its value, which it does not currently do. It currently generates a getStaticField opcode. 3) I never suggested treating defs like constants. A defconst would require some notion of constant. Since its value would be compiled into code, it must be different from def, the semantics of which are to compile into a deref of the var. Only the latter currently occurs. Doing otherwise automagically just inside case would be bizarre. Languages typically constrain the initializers of such constants, esp. to constant expressions themselves. CL allows for arbitrary initializers, but requires they be eval-able at compile time, always evaluate to the same value, and with much hand waving about execution order. e.g. if as you had suggested: (def x (launch-missiles)) (case foo x ...) Would missiles be launched during compilation or loading or both? Would x be nil or the number of missiles launched in the case? What if the case were wrapped in (binding/let [x ...] ...)? 4) is like 2. You are correct, 2,3,4 all conflict with the use of symbols in case, which is why supporting them is a breaking change, glommed on to a ticket that nominally is a bug fix, augmented by a perf optimization. Fixing the conflict isn't as simple as saying "use 'x for symbols" in case. You'd need to consider the interaction of constants and all special forms and macros, the relationship between constants and '[un]evaluated' positions everywhere. It's not a trivial thing. CL, FWIW, does not in fact support its own defconstants as keys in its case, considering them constants only in evaluated positions. I don't have the time right now to consider all of that, but I won't accept a design that hasn't.
Hide
Alexander Taggart added a comment - - edited

Chas, no worries. Being able to use named constant values as test constants makes my life so much easier, especially when dealing with java interop. Until now I've had to use (condp = x ...). Regardless of whether it makes it into core, I will still be using it.

Show
Alexander Taggart added a comment - - edited Chas, no worries. Being able to use named constant values as test constants makes my life so much easier, especially when dealing with java interop. Until now I've had to use (condp = x ...). Regardless of whether it makes it into core, I will still be using it.
Hide
Chas Emerick added a comment -

Well, everything is scope creep in the end.

By "macro their way out", I meant doing something like I did here to use non-literal constants in case forms. When I wrote that comment this morning, I was under the impression that only Clojure literals were considered acceptable.


Anyway, what follows is either incredibly pedantic, or a draft of a wiki page where a more agreeable solution can be hashed out.

It occurs to me that being explicit about what constants exist and what's on the table here would be helpful, at least for me:

  1. Clojure literals and composites thereof, as implemented in 1.2.0 case
  2. static final fields
  3. def'ed values, presumed to be unchanging
  4. enums

(1) is granted, and it sounds like moving beyond this is not desired at this time without significant further analysis/discussion (implying the application of Alexander's "basic" patch only).

(2) appears to be desired ("Ditto static finals as constants"). However, as soon as we allow for non-literal constant values (whether fields, def's, enums, or other), I'm very confused as to:

  • how what we use to refer to those values (surely symbols?) aren't turned into a second-class test values
  • how we're not faced with conditional resolution of those values (there's no calls anywhere, so it seems there's little difference between eval and direct usage of Reflector, at least for fields and enums)

Perhaps the defconst variation hinted at in conjunction with (3) would somehow avoid compile-time evaluation for def'ed values? I'm no longer familiar with CL and the defconstant prior art to reasonably guess at the relevant semantics of defconst.

Re: (4) and "The enum stuff is a snake pit full of new semantics, contradictions and special cases": I'm pretty sure I didn't know that. It seems like enum support is as close as:

(.ordinal (Reflector/getStaticField EnumClassName "EnumName"))

…with all reasonable error-checking and such to ensure the domain of the case form includes only enums of the same type (though again, the issues with resolving symbols remains).


Anyway, I'm sorry to have encouraged (egged on?) Alexander. I appear to have misread/misunderstood the desired scope of changes.

Show
Chas Emerick added a comment - Well, everything is scope creep in the end. By "macro their way out", I meant doing something like I did here to use non-literal constants in case forms. When I wrote that comment this morning, I was under the impression that only Clojure literals were considered acceptable.
Anyway, what follows is either incredibly pedantic, or a draft of a wiki page where a more agreeable solution can be hashed out. It occurs to me that being explicit about what constants exist and what's on the table here would be helpful, at least for me:
  1. Clojure literals and composites thereof, as implemented in 1.2.0 case
  2. static final fields
  3. def'ed values, presumed to be unchanging
  4. enums
(1) is granted, and it sounds like moving beyond this is not desired at this time without significant further analysis/discussion (implying the application of Alexander's "basic" patch only). (2) appears to be desired ("Ditto static finals as constants"). However, as soon as we allow for non-literal constant values (whether fields, def's, enums, or other), I'm very confused as to:
  • how what we use to refer to those values (surely symbols?) aren't turned into a second-class test values
  • how we're not faced with conditional resolution of those values (there's no calls anywhere, so it seems there's little difference between eval and direct usage of Reflector, at least for fields and enums)
Perhaps the defconst variation hinted at in conjunction with (3) would somehow avoid compile-time evaluation for def'ed values? I'm no longer familiar with CL and the defconstant prior art to reasonably guess at the relevant semantics of defconst. Re: (4) and "The enum stuff is a snake pit full of new semantics, contradictions and special cases": I'm pretty sure I didn't know that. It seems like enum support is as close as:
(.ordinal (Reflector/getStaticField EnumClassName "EnumName"))
…with all reasonable error-checking and such to ensure the domain of the case form includes only enums of the same type (though again, the issues with resolving symbols remains).
Anyway, I'm sorry to have encouraged (egged on?) Alexander. I appear to have misread/misunderstood the desired scope of changes.
Hide
Rich Hickey added a comment - - edited

Chas, I'm not sure what you mean by 'macro their way out of..." - are you saying def is too much work? Or is what you'd want, but doesn't work?

What's missing is not some extension to case, but possibly a broadening of what constitutes a compile time constant. At no point should that involve compile time evaluation in case. A defconst has long been requested and is a fine idea. Ditto static finals as constants. As long as case is defined to work with constants, we can enhance 'constants' and case can do more transparently (at least semantically transparently). All of this stuff around unquoted constant symbols, conditional compile time evaluation (when does that otherwise happen?) etc is a mess. Note that case is currently defined in terms of literals because Clojure doesn't have a strong notion of constant yet.

Also, I think you are not adequately considering the use of case in symbolic computation. People use case matching of symbols all the time when writing macros and compilers etc in CL, and wouldn't appreciate having to quote everything. Also, given we can match aggregates, how might one match (quote x), and why wouldn't that match 'x?

This is scope creep, and insufficiently considered, plain and simple. Let's leave it as a separate ticket. It's going to require more thought, in any case

Show
Rich Hickey added a comment - - edited Chas, I'm not sure what you mean by 'macro their way out of..." - are you saying def is too much work? Or is what you'd want, but doesn't work? What's missing is not some extension to case, but possibly a broadening of what constitutes a compile time constant. At no point should that involve compile time evaluation in case. A defconst has long been requested and is a fine idea. Ditto static finals as constants. As long as case is defined to work with constants, we can enhance 'constants' and case can do more transparently (at least semantically transparently). All of this stuff around unquoted constant symbols, conditional compile time evaluation (when does that otherwise happen?) etc is a mess. Note that case is currently defined in terms of literals because Clojure doesn't have a strong notion of constant yet. Also, I think you are not adequately considering the use of case in symbolic computation. People use case matching of symbols all the time when writing macros and compilers etc in CL, and wouldn't appreciate having to quote everything. Also, given we can match aggregates, how might one match (quote x), and why wouldn't that match 'x? This is scope creep, and insufficiently considered, plain and simple. Let's leave it as a separate ticket. It's going to require more thought, in any case
Hide
Chas Emerick added a comment -

I should say that I'd consider it perfectly reasonable if case were kept limited to using literals (if that limitation is considered important enough to retain at some level), but clojure.core provided a switch macro that evaluated symbols as Alexander has been reaching for.

Show
Chas Emerick added a comment - I should say that I'd consider it perfectly reasonable if case were kept limited to using literals (if that limitation is considered important enough to retain at some level), but clojure.core provided a switch macro that evaluated symbols as Alexander has been reaching for.
Hide
Rich Hickey added a comment -

Thanks for splitting this up. I'm not in favor of anything beyond basic. The enum stuff is a snake pit full of new semantics, contradictions and special cases. I had told Chas as much. Sorry I hadn't realized the enum thing had crept into this scope sooner.

Rich

Show
Rich Hickey added a comment - Thanks for splitting this up. I'm not in favor of anything beyond basic. The enum stuff is a snake pit full of new semantics, contradictions and special cases. I had told Chas as much. Sorry I hadn't realized the enum thing had crept into this scope sooner. Rich
Hide
Chas Emerick added a comment -

RH:

Java enums aren't Clojure constants.

SH:

The use of eval to detect enums feels icky, and counter to the stated behavior of allowing only compile-time constants.

I assume both of you are actually talking about literals. Limiting case to literal test values is simply too much of a handicap, even ignoring interop uses. One should not have to macro their way out of repeating a constant value (e.g. (def *mole* 6.02214179e23)) throughout a codebase that needs to use case extensively.

Leaving aside questions about mechanics, static final fields and enums are absolutely constants, and not being able to use them within case would be (and is today) a significant pain point. Yes, their value w.r.t. case can change after a case form has been AOT-compiled – which exactly mirrors switch (a good thing, IMO, making case a proper superset of switch). Again, something one can macro out of, but something I don't think people should have to macro out of.

Show
Chas Emerick added a comment - RH:
Java enums aren't Clojure constants.
SH:
The use of eval to detect enums feels icky, and counter to the stated behavior of allowing only compile-time constants.
I assume both of you are actually talking about literals. Limiting case to literal test values is simply too much of a handicap, even ignoring interop uses. One should not have to macro their way out of repeating a constant value (e.g. (def *mole* 6.02214179e23)) throughout a codebase that needs to use case extensively. Leaving aside questions about mechanics, static final fields and enums are absolutely constants, and not being able to use them within case would be (and is today) a significant pain point. Yes, their value w.r.t. case can change after a case form has been AOT-compiled – which exactly mirrors switch (a good thing, IMO, making case a proper superset of switch). Again, something one can macro out of, but something I don't think people should have to macro out of.
Hide
Alexander Taggart added a comment -

To make life easier, I'm altering the set of patches.

426-basic.patch:

  • handles hash collisions
  • can emit return type
  • performance path for all-int or all-char test constants
  • no documentation change

426-supports-named-values.patch:

  • all from above, plus...
  • evaluates unquoted test constant symbols at compile-time
  • validates test constants are suitable for case
  • doc change: "The test-constants can be any combination of numbers, chars, strings, keywords, quoted symbols, and (Clojure) composites thereof. Unquoted symbols will be evaluated at compile time, and must resolve to one of the preceding types."

426-supports-enums-and-named-values.patch:

  • all from above, plus...
  • conditionally supports java enums
  • doc change: "Java enums are supported only if all test constants are enums of the same type."
Show
Alexander Taggart added a comment - To make life easier, I'm altering the set of patches. 426-basic.patch:
  • handles hash collisions
  • can emit return type
  • performance path for all-int or all-char test constants
  • no documentation change
426-supports-named-values.patch:
  • all from above, plus...
  • evaluates unquoted test constant symbols at compile-time
  • validates test constants are suitable for case
  • doc change: "The test-constants can be any combination of numbers, chars, strings, keywords, quoted symbols, and (Clojure) composites thereof. Unquoted symbols will be evaluated at compile time, and must resolve to one of the preceding types."
426-supports-enums-and-named-values.patch:
  • all from above, plus...
  • conditionally supports java enums
  • doc change: "Java enums are supported only if all test constants are enums of the same type."
Hide
Alexander Taggart added a comment -

Support for enums is really just a degenerate case of supporting named constant values, e.g., java.lang.Math/PI, (def init-state 0).

The basic requirement of case is that the test constants have a known value at compile-time, and that value has consistent hashing. The current implementation assumes the latter requirement will be met by restricting test constants to literal types. This mostly works, but fails with things like regex patterns.

The above basic requirements can be met while removing the incidental "literal" restriction. To do so means resolving literal symbols into constant values at compile-time, and then validating that the types are suitable (e.g., number, string).

The simplest and most consistent approach would be to require quoting test constant symbols when used as symbols. Setting aside all other discussion of enums, etc., is this change to the treatment of symbols acceptable?

Show
Alexander Taggart added a comment - Support for enums is really just a degenerate case of supporting named constant values, e.g., java.lang.Math/PI, (def init-state 0). The basic requirement of case is that the test constants have a known value at compile-time, and that value has consistent hashing. The current implementation assumes the latter requirement will be met by restricting test constants to literal types. This mostly works, but fails with things like regex patterns. The above basic requirements can be met while removing the incidental "literal" restriction. To do so means resolving literal symbols into constant values at compile-time, and then validating that the types are suitable (e.g., number, string). The simplest and most consistent approach would be to require quoting test constant symbols when used as symbols. Setting aside all other discussion of enums, etc., is this change to the treatment of symbols acceptable?
Hide
Rich Hickey added a comment -

Java enums aren't Clojure constants.

Show
Rich Hickey added a comment - Java enums aren't Clojure constants.
Hide
Alexander Taggart added a comment -

simple semantics for enum support

As the updated doc states: "Java enums are supported if all test constants are enums of the same type."

Stu's example violates the above rule. The issue then becomes what should be the response: error at compile-time, or treat as symbol. More on this in a following comment.

Re: lookupswitch ... is the tableswitch code in some way wrong?

Not at all, but the current version will throw an exception when it can't find a sufficient shift-mask:

user=> (case 1
         1     :a
         16384 :b
         16385 :c)
IllegalArgumentException No distinct mapping found  clojure.core/min-hash (core.clj:5708)

The lookupswitch bytecode is only used to substitute for throwing that exception.

The current process of fitting test constants into a switch:

  1. Coerce to int by hashing
  2. if shift-mask found, use tableswitch and shift-mask
  3. else throw exception

The process used in the patch:

  1. Coerce to int (hash, enum ordinal, or int value)
  2. if (max-int - min-int) < max-tableswitch-size, use tableswitch without shift-mask
  3. else if shift-mask found, use tableswitch with shift-mask
  4. else use lookupswitch without shift-mask

Where max-tableswitch-size is simply an explicit consequence of the mask width.

Show
Alexander Taggart added a comment -
simple semantics for enum support
As the updated doc states: "Java enums are supported if all test constants are enums of the same type." Stu's example violates the above rule. The issue then becomes what should be the response: error at compile-time, or treat as symbol. More on this in a following comment.
Re: lookupswitch ... is the tableswitch code in some way wrong?
Not at all, but the current version will throw an exception when it can't find a sufficient shift-mask:
user=> (case 1
         1     :a
         16384 :b
         16385 :c)
IllegalArgumentException No distinct mapping found  clojure.core/min-hash (core.clj:5708)
The lookupswitch bytecode is only used to substitute for throwing that exception. The current process of fitting test constants into a switch:
  1. Coerce to int by hashing
  2. if shift-mask found, use tableswitch and shift-mask
  3. else throw exception
The process used in the patch:
  1. Coerce to int (hash, enum ordinal, or int value)
  2. if (max-int - min-int) < max-tableswitch-size, use tableswitch without shift-mask
  3. else if shift-mask found, use tableswitch with shift-mask
  4. else use lookupswitch without shift-mask
Where max-tableswitch-size is simply an explicit consequence of the mask width.
Hide
Rich Hickey added a comment -

Unless someone can provide some simple semantics for enum support, I'd rather not support them.

Re: lookupswitch - I don't see any reason to support it - is the tableswitch code in some way wrong?

Show
Rich Hickey added a comment - Unless someone can provide some simple semantics for enum support, I'd rather not support them. Re: lookupswitch - I don't see any reason to support it - is the tableswitch code in some way wrong?
Hide
Alexander Taggart added a comment -

Regarding the mixed enum/non-enum case:

The primary patch treats symbols which do not resolve to enums as symbols, but also treats enums as symbols when the set of test-constants was not all-enum. Both parts were an attempt at backwards compatibility.

I agree that the better behaviour would be to error on the mixed case, though this would break any code where a test-constant symbol happens to resolve to an enum.

Regarding eval:

It doesn't get around the "compile-time" restriction, but rather the "literal" restriction.

Show
Alexander Taggart added a comment - Regarding the mixed enum/non-enum case: The primary patch treats symbols which do not resolve to enums as symbols, but also treats enums as symbols when the set of test-constants was not all-enum. Both parts were an attempt at backwards compatibility. I agree that the better behaviour would be to error on the mixed case, though this would break any code where a test-constant symbol happens to resolve to an enum. Regarding eval: It doesn't get around the "compile-time" restriction, but rather the "literal" restriction.
Hide
Stuart Halloway added a comment -

Rich: to the extent that we believe in testing for things like reflection warnings, it would be nice to have a way to get to that information other than be scraping stderr. If this is of interest let me know and I will make a ticket and/or start a design discussion as appropriate.

Show
Stuart Halloway added a comment - Rich: to the extent that we believe in testing for things like reflection warnings, it would be nice to have a way to get to that information other than be scraping stderr. If this is of interest let me know and I will make a ticket and/or start a design discussion as appropriate.
Hide
Stuart Halloway added a comment -

The use of eval to detect enums feels icky, and counter to the stated behavior of allowing only compile-time constants.

Show
Stuart Halloway added a comment - The use of eval to detect enums feels icky, and counter to the stated behavior of allowing only compile-time constants.
Hide
Stuart Halloway added a comment -

enum support feels half-baked. For example, an attempt to mix enums and non enums compiles but then cannot find the enum:

(case clojure.lang.Compiler$CaseTestEnumABC/a
        clojure.lang.Compiler$CaseTestEnumABC/a 1
        1 2)
IllegalArgumentException No matching clause: a
Show
Stuart Halloway added a comment - enum support feels half-baked. For example, an attempt to mix enums and non enums compiles but then cannot find the enum:
(case clojure.lang.Compiler$CaseTestEnumABC/a
        clojure.lang.Compiler$CaseTestEnumABC/a 1
        1 2)
IllegalArgumentException No matching clause: a
Hide
Stuart Halloway added a comment -

Rich: screened in this case means only that the tests pass. I am continuing to dig into this patch, adding comments as I go. Let me know if there are specific things I should be investigating.

Show
Stuart Halloway added a comment - Rich: screened in this case means only that the tests pass. I am continuing to dig into this patch, adding comments as I go. Let me know if there are specific things I should be investigating.
Hide
Stuart Halloway added a comment -

What's the basis for deciding when to use tableswitch vs. lookupswitch? I tried the sparse example [-100, 0, 100] from http://java.sun.com/docs/books/jvms/second_edition/html/Compiling.doc.html and the code produced a tableswitch. Is this desirable?

Show
Stuart Halloway added a comment - What's the basis for deciding when to use tableswitch vs. lookupswitch? I tried the sparse example [-100, 0, 100] from http://java.sun.com/docs/books/jvms/second_edition/html/Compiling.doc.html and the code produced a tableswitch. Is this desirable?
Hide
Alexander Taggart added a comment -

Per cemerick's request I'm providing a secondary patch which would provide for evaluating test-constant symbols at compile time. This would allow us to use switches where writing a literal value is difficult. E.g.:

user=> (defn foo [x]
         (case x
           java.lang.Math/PI :pi
           java.lang.Math/E :e
           :oops))
#'user/foo
user=> (foo java.lang.Math/E)
:e

Quoted symbols are not evaluated, but there are backwards compatibility concerns with evaluating unquoted symbols:

  • if evaluation is successful, it might be unintended thus breaking existing code
  • if evaluation is unsucessful
    • continuing to treat it as a symbol would preserve partial backwards compatibility
    • throwing an error would break compatability, but allow for a more consistent behavior going forward, i.e., quote symbols when you want symbols
Show
Alexander Taggart added a comment - Per cemerick's request I'm providing a secondary patch which would provide for evaluating test-constant symbols at compile time. This would allow us to use switches where writing a literal value is difficult. E.g.:
user=> (defn foo [x]
         (case x
           java.lang.Math/PI :pi
           java.lang.Math/E :e
           :oops))
#'user/foo
user=> (foo java.lang.Math/E)
:e
Quoted symbols are not evaluated, but there are backwards compatibility concerns with evaluating unquoted symbols:
  • if evaluation is successful, it might be unintended thus breaking existing code
  • if evaluation is unsucessful
    • continuing to treat it as a symbol would preserve partial backwards compatibility
    • throwing an error would break compatability, but allow for a more consistent behavior going forward, i.e., quote symbols when you want symbols
Hide
Alexander Taggart added a comment -

Update to fix a bug.

Show
Alexander Taggart added a comment - Update to fix a bug.
Hide
Alexander Taggart added a comment - - edited

Updated patch so that case is no longer one gigantic ball of code. And fixed some bugs

Show
Alexander Taggart added a comment - - edited Updated patch so that case is no longer one gigantic ball of code. And fixed some bugs
Hide
Alexander Taggart added a comment -

Patch happens to fix CLJ-438 as well.

Show
Alexander Taggart added a comment - Patch happens to fix CLJ-438 as well.
Hide
Alexander Taggart added a comment -

case changes:

  • handles hash collisions
  • can emit return type
  • conditionally supports java enums
  • performance path for all-int or all-char test constants
Show
Alexander Taggart added a comment - case changes:
  • handles hash collisions
  • can emit return type
  • conditionally supports java enums
  • performance path for all-int or all-char test constants
Hide
Aaron Bedra added a comment -

I'm gonna give this a shot. If anyone has any solutions to this please feel free to jump in here while I am working on it.

Show
Aaron Bedra added a comment - I'm gonna give this a shot. If anyone has any solutions to this please feel free to jump in here while I am working on it.
Hide
Rich Hickey added a comment -

while in there we should look at special-casing all-int case, and primitive return from case

Show
Rich Hickey added a comment - while in there we should look at special-casing all-int case, and primitive return from case
Hide
Assembla Importer added a comment -

cemerick said: It would be nice if case were further enhanced to support references to enums and static final fields (and maybe other stably-referenceable values as well?). A userland workaround is shown here (added here instead of in a separate ticket at Rich's request).

Show
Assembla Importer added a comment - cemerick said: It would be nice if case were further enhanced to support references to enums and static final fields (and maybe other stably-referenceable values as well?). A userland workaround is shown here (added here instead of in a separate ticket at Rich's request).
Hide
Assembla Importer added a comment -

stu said: Updating tickets (#427, #426, #421, #420, #397)

Show
Assembla Importer added a comment - stu said: Updating tickets (#427, #426, #421, #420, #397)
Hide
Assembla Importer added a comment -
Show
Assembla Importer added a comment - Converted from http://www.assembla.com/spaces/clojure/tickets/426

People

Vote (1)
Watch (3)

Dates

  • Created:
    Updated:
    Resolved: