ClojureScript

(get [1 2 3] nil) -> 1

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  1. cljs-728-minimal.diff
    29/Jan/14 6:49 PM
    8 kB
    Francis Avila
  2. cljs-728-more-minimal.patch
    30/Jan/14 11:49 AM
    8 kB
    Francis Avila
  3. cljs-728-nth-number.patch
    31/Dec/13 12:02 AM
    13 kB
    Francis Avila
  4. cljs-728-only-nth-lookup.patch
    24/Feb/14 12:41 AM
    6 kB
    Francis Avila
  5. cljs-728-only-nth-lookup.patch
    17/Feb/14 10:49 PM
    6 kB
    Francis Avila

Activity

Hide
Francis Avila added a comment -

Rebased cljs-728-only-nth-lookup.patch again. (core_test.cljs changes would have "conflicted".)

Show
Francis Avila added a comment - Rebased cljs-728-only-nth-lookup.patch again. (core_test.cljs changes would have "conflicted".)
Hide
Francis Avila added a comment -

Patch rebased and trimmed: cljs-728-only-nth-lookup.patch.

Show
Francis Avila added a comment - Patch rebased and trimmed: cljs-728-only-nth-lookup.patch.
Hide
David Nolen added a comment -

Francis Avila, many thanks will look over these.

Show
David Nolen added a comment - Francis Avila, many thanks will look over these.
Hide
Francis Avila added a comment -

Patch that aligns ClojureScript vector type's -assoc-n/-assoc with Clojure's is in CLJS-767, and patch that fixes assoc! on TransientVectors is in CLJS-768.

I will cut conflicting hunks out of this CLJS-728 patch (hunks 2.3, 2.5 and 2.6 as outlined above) so all can apply cleanly with one another. (I'll have to do it this evening though.)

Full list of related tickets and patches (stuff that was factored out of the initial patches):

  1. CLJS-757: remove redundant bounds checking in indexed types (will need a rebase once the others are in).
  2. CLJS-767: Fix assoc on PersistentVector and Subvec
  3. CLJS-768: Fix assoc! on TransientVector
Show
Francis Avila added a comment - Patch that aligns ClojureScript vector type's -assoc-n/-assoc with Clojure's is in CLJS-767, and patch that fixes assoc! on TransientVectors is in CLJS-768. I will cut conflicting hunks out of this CLJS-728 patch (hunks 2.3, 2.5 and 2.6 as outlined above) so all can apply cleanly with one another. (I'll have to do it this evening though.) Full list of related tickets and patches (stuff that was factored out of the initial patches):
  1. CLJS-757: remove redundant bounds checking in indexed types (will need a rebase once the others are in).
  2. CLJS-767: Fix assoc on PersistentVector and Subvec
  3. CLJS-768: Fix assoc! on TransientVector
Hide
David Nolen added a comment -

That the ClojureScript vector types implement assoc in terms of assoc-n and not the other way around should be considered a flaw. Please refer to Clojure's PersistentVector.java, RT.java for a guide about how the logic should be implemented. I'll take a patch that aligns ClojureScript's logic closer to Clojure. Thanks.

Show
David Nolen added a comment - That the ClojureScript vector types implement assoc in terms of assoc-n and not the other way around should be considered a flaw. Please refer to Clojure's PersistentVector.java, RT.java for a guide about how the logic should be implemented. I'll take a patch that aligns ClojureScript's logic closer to Clojure. Thanks.
Hide
Francis Avila added a comment -

New patch cljs-728-more-minimal.patch

Hunk-by-hunk justification of changes:

  1. core_test.cljs
    1. test (get x n) (get x n nf) (nth x n) (nth x n nf) and (assoc x n v) with non-numeric n against PersistentVector, Subvec, TransientVector, and Range types.
  2. core.cljs
    1. number? check in nth
    2. number? check in PersistentVector -lookup
    3. number? check in PersistentVector -assoc
    4. number? check in Subvec -lookup
    5. number? check in Subvec -assoc
    6. number? check in TransientVector -assoc!
    7. number? check in TransientVector -lookup

Although it does not swap -assoc-n and -assoc on Subvec and PersistentVector, I would hardly describe this patch as
"far more minimal" than the previous one--I hope that does not mean it is unacceptable. The only way I can see to
make this patch smaller is to file a different ticket for each hunk and corresponding test, which seems crazy.

If the patch is still not acceptable, could you please be much more specific about what exactly is wrong with it so we can close this bug.

Show
Francis Avila added a comment - New patch cljs-728-more-minimal.patch Hunk-by-hunk justification of changes:
  1. core_test.cljs
    1. test (get x n) (get x n nf) (nth x n) (nth x n nf) and (assoc x n v) with non-numeric n against PersistentVector, Subvec, TransientVector, and Range types.
  2. core.cljs
    1. number? check in nth
    2. number? check in PersistentVector -lookup
    3. number? check in PersistentVector -assoc
    4. number? check in Subvec -lookup
    5. number? check in Subvec -assoc
    6. number? check in TransientVector -assoc!
    7. number? check in TransientVector -lookup
Although it does not swap -assoc-n and -assoc on Subvec and PersistentVector, I would hardly describe this patch as "far more minimal" than the previous one--I hope that does not mean it is unacceptable. The only way I can see to make this patch smaller is to file a different ticket for each hunk and corresponding test, which seems crazy. If the patch is still not acceptable, could you please be much more specific about what exactly is wrong with it so we can close this bug.
Hide
Francis Avila added a comment -

It doesn't make sense for indexed types to have -lookup call -nth but -assoc-n call -assoc (as PersistentVector and Subvec do, but not TransientVector). This means if we know we are using a numeric key we call -nth on read, but call -assoc on writes instead of -assoc-n. -assoc-n is what has the required "key is a number" semantic, not -assoc, and this leads to an unnecessary number? check being done in the implementation of Subvec -conj. (There are also other places, e.g. Subvec -assoc, where -assoc-n could be called instead of assoc or -assoc, but I don't address those.)

In other words, it would be best if -assoc-n/-assoc-n! consistently meant "numeric key" 1) so that callers could avoid a number? check if they know their key is numeric, 2) so there is a sane analogy -lookup:-nth :: -assoc:-assoc-n :: -assoc!:-assoc-n!

Show
Francis Avila added a comment - It doesn't make sense for indexed types to have -lookup call -nth but -assoc-n call -assoc (as PersistentVector and Subvec do, but not TransientVector). This means if we know we are using a numeric key we call -nth on read, but call -assoc on writes instead of -assoc-n. -assoc-n is what has the required "key is a number" semantic, not -assoc, and this leads to an unnecessary number? check being done in the implementation of Subvec -conj. (There are also other places, e.g. Subvec -assoc, where -assoc-n could be called instead of assoc or -assoc, but I don't address those.) In other words, it would be best if -assoc-n/-assoc-n! consistently meant "numeric key" 1) so that callers could avoid a number? check if they know their key is numeric, 2) so there is a sane analogy -lookup:-nth :: -assoc:-assoc-n :: -assoc!:-assoc-n!
Hide
David Nolen added a comment -

-assoc and -assoc! do not need fixing. Only -assoc-n and -assoc-n! do. The patch needs to be far more minimal if it hopes to get accepted.

Show
David Nolen added a comment - -assoc and -assoc! do not need fixing. Only -assoc-n and -assoc-n! do. The patch needs to be far more minimal if it hopes to get accepted.
Hide
Francis Avila added a comment -

-assoc and -assoc! need fixing too on various types because (assoc [:a] nil :b) => [:b].

I have attached a smaller patch which does not include the hunks related to redundant bounds checking and will move those to CLJS-757. This is the smallest patch which will pass all the tests related to using non-numeric indexes with indexed types.

Show
Francis Avila added a comment - -assoc and -assoc! need fixing too on various types because (assoc [:a] nil :b) => [:b]. I have attached a smaller patch which does not include the hunks related to redundant bounds checking and will move those to CLJS-757. This is the smallest patch which will pass all the tests related to using non-numeric indexes with indexed types.
Hide
David Nolen added a comment - - edited

The only code that needs to change it seems to me is nth and implementations of ILookup for IIndexed types.

Show
David Nolen added a comment - - edited The only code that needs to change it seems to me is nth and implementations of ILookup for IIndexed types.
Hide
Francis Avila added a comment -

Remember the problem isn't just (get [1] nil), it's also (get [1] :a) and also on Range, subvectors, and transientvector types (any non-numeric lookup on an indexed type). So a nil? check in nth is nowhere near enough.

For indexed types, get calls -lookup calls -nth (nth is not called), and you didn't want to do a nil? or number? check inside -nth (for good reason, as we can assume callers of -nth know they need to supply a number). This patch puts the number check on indexed types in -lookup (rationale given in my previous comment) and makes sure that other core code is calling -nth safely without checks.

The patch also removes some bounds and type checking that was redundant (the unchecked-array-for and -assoc-n stuff) which I noticed because I had to examine all those calling paths.

I suppose an alternative that would change less code is to have get use IIndexed if present and do the number check there, but then it's impossible for public code to use ILookup if there were a type where there was a difference. (e.g., Suppose an xml-element type wanted to provide access attributes with get and child elements with nth.)

Show
Francis Avila added a comment - Remember the problem isn't just (get [1] nil), it's also (get [1] :a) and also on Range, subvectors, and transientvector types (any non-numeric lookup on an indexed type). So a nil? check in nth is nowhere near enough. For indexed types, get calls -lookup calls -nth (nth is not called), and you didn't want to do a nil? or number? check inside -nth (for good reason, as we can assume callers of -nth know they need to supply a number). This patch puts the number check on indexed types in -lookup (rationale given in my previous comment) and makes sure that other core code is calling -nth safely without checks. The patch also removes some bounds and type checking that was redundant (the unchecked-array-for and -assoc-n stuff) which I noticed because I had to examine all those calling paths. I suppose an alternative that would change less code is to have get use IIndexed if present and do the number check there, but then it's impossible for public code to use ILookup if there were a type where there was a difference. (e.g., Suppose an xml-element type wanted to provide access attributes with get and child elements with nth.)
Hide
David Nolen added a comment -

This patch does too much. Why are we not just checking in nth?

Show
David Nolen added a comment - This patch does too much. Why are we not just checking in nth?
Hide
Francis Avila added a comment -


JIRA won't let me delete them. The latest patch is cljs-728-nth-number.patch

Show
Francis Avila added a comment - JIRA won't let me delete them. The latest patch is cljs-728-nth-number.patch
Hide
David Nolen added a comment -

Can we remove old patches, thanks.

Show
David Nolen added a comment - Can we remove old patches, thanks.
Hide
Francis Avila added a comment -

Updated patch.

  • I've removed all redundant bounds checking that I could easily eliminate using unchecked-array-for in place of array-for.
  • IVector and IIndexed protocol implementations don't have number? checks in them. nth does the check.
  • But the number? check is in the IAssociative and ILookup implementations for these types.
  • The logic here is that for protocols which semantically require a numeric index, we assume the user of the protocol (nth or ci-reduce for example) has already done the necessary type checking on the outside. But for protocols which don't have this semantic (e.g. ILookup), we need to do the check in the protocol implementation.
  • I ran script/benchmark and compared master and this patch on the v8 and spidermonkey engines. I didn't see any performance regressions.
Show
Francis Avila added a comment - Updated patch.
  • I've removed all redundant bounds checking that I could easily eliminate using unchecked-array-for in place of array-for.
  • IVector and IIndexed protocol implementations don't have number? checks in them. nth does the check.
  • But the number? check is in the IAssociative and ILookup implementations for these types.
  • The logic here is that for protocols which semantically require a numeric index, we assume the user of the protocol (nth or ci-reduce for example) has already done the necessary type checking on the outside. But for protocols which don't have this semantic (e.g. ILookup), we need to do the check in the protocol implementation.
  • I ran script/benchmark and compared master and this patch on the v8 and spidermonkey engines. I didn't see any performance regressions.
Hide
Francis Avila added a comment -

OK, then in that case a number? check somewhere is absolutely required, but I can move code around a bit to ensure that no type or bounds checking is done more than once for persistent and transient vectors (there is already a fair amount of redundant bounds checking). I'll work on this tonight.

(I sent a CA about 10 days ago but I see my name isn't on the CA page yet. It's probably just delayed by holiday stuff.)

Show
Francis Avila added a comment - OK, then in that case a number? check somewhere is absolutely required, but I can move code around a bit to ensure that no type or bounds checking is done more than once for persistent and transient vectors (there is already a fair amount of redundant bounds checking). I'll work on this tonight. (I sent a CA about 10 days ago but I see my name isn't on the CA page yet. It's probably just delayed by holiday stuff.)
Hide
David Nolen added a comment -

No coercions means no coercions. The only valid key to an IVector instance for lookup is a JavaScript number.

Show
David Nolen added a comment - No coercions means no coercions. The only valid key to an IVector instance for lookup is a JavaScript number.
Hide
Francis Avila added a comment -

What do you mean by "no coercions"? Do you mean edge cases like {{(get [1 2] "1") => 2}} are unacceptable even if it means we can avoid a typeof check?

Show
Francis Avila added a comment - What do you mean by "no coercions"? Do you mean edge cases like {{(get [1 2] "1") => 2}} are unacceptable even if it means we can avoid a typeof check?
Hide
David Nolen added a comment -

No coercions. Any check should be higher up the chain, like at nth and nowhere else.

Show
David Nolen added a comment - No coercions. Any check should be higher up the chain, like at nth and nowhere else.
Hide
Francis Avila added a comment -

Is number? really that slow? At least on v8 it seems to be twice as fast as nil?. Some repl checks in v8 don't seem to show a significant performance regression, but without a proper profiling harness and test cases I don't feel comfortable saying that for sure.

I think we can give up some type safety but still catch silent-coerce-to-zero cases by expressing the conditional as (and (< i (.-cnt pv)) (or (< 0 i) (zero? i))). I'll try that later today and try to compare the performance.

Your concern about checking bounds only once can be resolved by moving the guts of array-for into unchecked-array-for and editable-array-for into unchecked-editable-array-for. This is probably a good idea even if we do nothing about (get [1] nil) --at least -pop and -kv-reduce would benefit along with the not-found arities of -nth and -lookup. Perhaps a new ticket for this?

Show
Francis Avila added a comment - Is number? really that slow? At least on v8 it seems to be twice as fast as nil?. Some repl checks in v8 don't seem to show a significant performance regression, but without a proper profiling harness and test cases I don't feel comfortable saying that for sure. I think we can give up some type safety but still catch silent-coerce-to-zero cases by expressing the conditional as (and (< i (.-cnt pv)) (or (< 0 i) (zero? i))). I'll try that later today and try to compare the performance. Your concern about checking bounds only once can be resolved by moving the guts of array-for into unchecked-array-for and editable-array-for into unchecked-editable-array-for. This is probably a good idea even if we do nothing about (get [1] nil) --at least -pop and -kv-reduce would benefit along with the not-found arities of -nth and -lookup. Perhaps a new ticket for this?
Hide
David Nolen added a comment -

Putting number? checks on critical paths is unacceptable for performance reasons. Ideally we only add one number? check. And even then we'd want to know the performance implications.

Show
David Nolen added a comment - Putting number? checks on critical paths is unacceptable for performance reasons. Ideally we only add one number? check. And even then we'd want to know the performance implications.
Hide
Francis Avila added a comment -

Actually this fails for other values that may coerce to zero as well (e.g. false, numeric strings, etc). I solved this with a number? type check in appropriate places. Patch attached with comprehensive tests.

Show
Francis Avila added a comment - Actually this fails for other values that may coerce to zero as well (e.g. false, numeric strings, etc). I solved this with a number? type check in appropriate places. Patch attached with comprehensive tests.
Hide
Francis Avila added a comment - - edited

Currently failing test cases.

(assert (nil? (get [1 2] nil)))
(assert (= :fail (try (nth [1 2] nil) (catch js/Error e :fail))))
(assert (= 4 (get [1 2] nil 4)))
(assert (= 4 (nth [1 2] nil 4)))
(assert (= :fail (try (assoc [1 2] nil 4) (catch js/Error e :fail))))

(assert (nil? (get (transient [1 2]) nil)))
(assert (= :fail (try (nth (transient [1 2]) nil) (catch js/Error e :fail))))
(assert (= 4 (get (transient [1 2]) nil 4)))
(assert (= 4 (nth (transient [1 2]) nil 4)))
(assert (= :fail (try (assoc! (transient [1 2]) nil 4)
                   (catch js/Error e :fail))))

(assert (nil? (get (subvec [1 2] 1) nil)))
(assert (= :fail (try (nth (subvec [1 2] 1) nil) (catch js/Error e :fail))))
(assert (= 4 (get (subvec [1 2] 1) nil 4)))
(assert (= 4 (nth (subvec [1 2] 1) nil 4)))
(assert (= :fail (try (assoc (subvec [1 2] 1) nil 4)
                   (catch js/Error e :fail))))
Show
Francis Avila added a comment - - edited Currently failing test cases.
(assert (nil? (get [1 2] nil)))
(assert (= :fail (try (nth [1 2] nil) (catch js/Error e :fail))))
(assert (= 4 (get [1 2] nil 4)))
(assert (= 4 (nth [1 2] nil 4)))
(assert (= :fail (try (assoc [1 2] nil 4) (catch js/Error e :fail))))

(assert (nil? (get (transient [1 2]) nil)))
(assert (= :fail (try (nth (transient [1 2]) nil) (catch js/Error e :fail))))
(assert (= 4 (get (transient [1 2]) nil 4)))
(assert (= 4 (nth (transient [1 2]) nil 4)))
(assert (= :fail (try (assoc! (transient [1 2]) nil 4)
                   (catch js/Error e :fail))))

(assert (nil? (get (subvec [1 2] 1) nil)))
(assert (= :fail (try (nth (subvec [1 2] 1) nil) (catch js/Error e :fail))))
(assert (= 4 (get (subvec [1 2] 1) nil 4)))
(assert (= 4 (nth (subvec [1 2] 1) nil 4)))
(assert (= :fail (try (assoc (subvec [1 2] 1) nil 4)
                   (catch js/Error e :fail))))
Hide
Francis Avila added a comment - - edited

This is due to javascript coercing null to 0 in the various places where -assoc and -nth check (<= 0 n) or (bit-and n MAGIC-NUMBER). E.g. in PersistentVector:

IIndexed
  (-nth [coll n]
    (aget (array-for coll n) (bit-and n 0x01f)))
  (-nth [coll n not-found]
    (if (and (<= 0 n) (< n cnt))
      (-nth coll n)
      not-found))

I can track all these down and patch them by changing the guards to (and (not (nil? n)) (<= 0 n) (< n cnt)) (or maybe even (or (zero? n) (< 0 n cnt))) and adding (when-not (nil? n) ...) guards where appropriate.

However I wasn't sure if there's any intention to make the comparison operators <= < > >= null-safe. I.e. instead of adding these extra nil checks, should we instead make sure comparing against nil always returns false or throws an exception? (We would still need nil checks to make sure (get [1] nil) returns nil like Clojure.)

Show
Francis Avila added a comment - - edited This is due to javascript coercing null to 0 in the various places where -assoc and -nth check (<= 0 n) or (bit-and n MAGIC-NUMBER). E.g. in PersistentVector:
IIndexed
  (-nth [coll n]
    (aget (array-for coll n) (bit-and n 0x01f)))
  (-nth [coll n not-found]
    (if (and (<= 0 n) (< n cnt))
      (-nth coll n)
      not-found))
I can track all these down and patch them by changing the guards to (and (not (nil? n)) (<= 0 n) (< n cnt)) (or maybe even (or (zero? n) (< 0 n cnt))) and adding (when-not (nil? n) ...) guards where appropriate. However I wasn't sure if there's any intention to make the comparison operators <= < > >= null-safe. I.e. instead of adding these extra nil checks, should we instead make sure comparing against nil always returns false or throws an exception? (We would still need nil checks to make sure (get [1] nil) returns nil like Clojure.)

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: