<< Back to previous view

[CLJS-728] (get [1 2 3] nil) -> 1 Created: 16/Dec/13  Updated: 24/Feb/14  Resolved: 24/Feb/14

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: David Nolen Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: None

Attachments: File cljs-728-minimal.diff     Text File cljs-728-more-minimal.patch     Text File cljs-728-nth-number.patch     Text File cljs-728-only-nth-lookup.patch     Text File cljs-728-only-nth-lookup.patch    

Comment by Francis Avila [ 26/Dec/13 4:49 PM ]

This is due to javascript coercing null to 0 in the various places where -assoc and -nth check (<= 0 n) or (bit-and n MAGIC-NUMBER). E.g. in PersistentVector:

  (-nth [coll n]
    (aget (array-for coll n) (bit-and n 0x01f)))
  (-nth [coll n not-found]
    (if (and (<= 0 n) (< n cnt))
      (-nth coll n)

I can track all these down and patch them by changing the guards to (and (not (nil? n)) (<= 0 n) (< n cnt)) (or maybe even (or (zero? n) (< 0 n cnt))) and adding (when-not (nil? n) ...) guards where appropriate.

However I wasn't sure if there's any intention to make the comparison operators <= < > >= null-safe. I.e. instead of adding these extra nil checks, should we instead make sure comparing against nil always returns false or throws an exception? (We would still need nil checks to make sure (get [1] nil) returns nil like Clojure.)

Comment by Francis Avila [ 26/Dec/13 11:26 PM ]

Currently failing test cases.

(assert (nil? (get [1 2] nil)))
(assert (= :fail (try (nth [1 2] nil) (catch js/Error e :fail))))
(assert (= 4 (get [1 2] nil 4)))
(assert (= 4 (nth [1 2] nil 4)))
(assert (= :fail (try (assoc [1 2] nil 4) (catch js/Error e :fail))))

(assert (nil? (get (transient [1 2]) nil)))
(assert (= :fail (try (nth (transient [1 2]) nil) (catch js/Error e :fail))))
(assert (= 4 (get (transient [1 2]) nil 4)))
(assert (= 4 (nth (transient [1 2]) nil 4)))
(assert (= :fail (try (assoc! (transient [1 2]) nil 4)
                   (catch js/Error e :fail))))

(assert (nil? (get (subvec [1 2] 1) nil)))
(assert (= :fail (try (nth (subvec [1 2] 1) nil) (catch js/Error e :fail))))
(assert (= 4 (get (subvec [1 2] 1) nil 4)))
(assert (= 4 (nth (subvec [1 2] 1) nil 4)))
(assert (= :fail (try (assoc (subvec [1 2] 1) nil 4)
                   (catch js/Error e :fail))))
Comment by Francis Avila [ 27/Dec/13 2:11 AM ]

Actually this fails for other values that may coerce to zero as well (e.g. false, numeric strings, etc). I solved this with a number? type check in appropriate places. Patch attached with comprehensive tests.

Comment by David Nolen [ 30/Dec/13 8:54 AM ]

Putting number? checks on critical paths is unacceptable for performance reasons. Ideally we only add one number? check. And even then we'd want to know the performance implications.

Comment by Francis Avila [ 30/Dec/13 1:04 PM ]

Is number? really that slow? At least on v8 it seems to be twice as fast as nil?. Some repl checks in v8 don't seem to show a significant performance regression, but without a proper profiling harness and test cases I don't feel comfortable saying that for sure.

I think we can give up some type safety but still catch silent-coerce-to-zero cases by expressing the conditional as (and (< i (.-cnt pv)) (or (< 0 i) (zero? i))). I'll try that later today and try to compare the performance.

Your concern about checking bounds only once can be resolved by moving the guts of array-for into unchecked-array-for and editable-array-for into unchecked-editable-array-for. This is probably a good idea even if we do nothing about (get [1] nil) --at least -pop and -kv-reduce would benefit along with the not-found arities of -nth and -lookup. Perhaps a new ticket for this?

Comment by David Nolen [ 30/Dec/13 1:09 PM ]

No coercions. Any check should be higher up the chain, like at nth and nowhere else.

Comment by Francis Avila [ 30/Dec/13 1:31 PM ]

What do you mean by "no coercions"? Do you mean edge cases like {{(get [1 2] "1") => 2}} are unacceptable even if it means we can avoid a typeof check?

Comment by David Nolen [ 30/Dec/13 1:40 PM ]

No coercions means no coercions. The only valid key to an IVector instance for lookup is a JavaScript number.

Comment by Francis Avila [ 30/Dec/13 1:53 PM ]

OK, then in that case a number? check somewhere is absolutely required, but I can move code around a bit to ensure that no type or bounds checking is done more than once for persistent and transient vectors (there is already a fair amount of redundant bounds checking). I'll work on this tonight.

(I sent a CA about 10 days ago but I see my name isn't on the CA page yet. It's probably just delayed by holiday stuff.)

Comment by Francis Avila [ 31/Dec/13 12:02 AM ]

Updated patch.

  • I've removed all redundant bounds checking that I could easily eliminate using unchecked-array-for in place of array-for.
  • IVector and IIndexed protocol implementations don't have number? checks in them. nth does the check.
  • But the number? check is in the IAssociative and ILookup implementations for these types.
  • The logic here is that for protocols which semantically require a numeric index, we assume the user of the protocol (nth or ci-reduce for example) has already done the necessary type checking on the outside. But for protocols which don't have this semantic (e.g. ILookup), we need to do the check in the protocol implementation.
  • I ran script/benchmark and compared master and this patch on the v8 and spidermonkey engines. I didn't see any performance regressions.
Comment by David Nolen [ 07/Jan/14 7:09 AM ]

Can we remove old patches, thanks.

Comment by Francis Avila [ 07/Jan/14 8:24 AM ]

JIRA won't let me delete them. The latest patch is cljs-728-nth-number.patch

Comment by David Nolen [ 16/Jan/14 5:31 PM ]

This patch does too much. Why are we not just checking in nth?

Comment by Francis Avila [ 16/Jan/14 7:04 PM ]

Remember the problem isn't just (get [1] nil), it's also (get [1] :a) and also on Range, subvectors, and transientvector types (any non-numeric lookup on an indexed type). So a nil? check in nth is nowhere near enough.

For indexed types, get calls -lookup calls -nth (nth is not called), and you didn't want to do a nil? or number? check inside -nth (for good reason, as we can assume callers of -nth know they need to supply a number). This patch puts the number check on indexed types in -lookup (rationale given in my previous comment) and makes sure that other core code is calling -nth safely without checks.

The patch also removes some bounds and type checking that was redundant (the unchecked-array-for and -assoc-n stuff) which I noticed because I had to examine all those calling paths.

I suppose an alternative that would change less code is to have get use IIndexed if present and do the number check there, but then it's impossible for public code to use ILookup if there were a type where there was a difference. (e.g., Suppose an xml-element type wanted to provide access attributes with get and child elements with nth.)

Comment by David Nolen [ 16/Jan/14 10:39 PM ]

The only code that needs to change it seems to me is nth and implementations of ILookup for IIndexed types.

Comment by Francis Avila [ 29/Jan/14 6:49 PM ]

-assoc and -assoc! need fixing too on various types because (assoc [:a] nil :b) => [:b].

I have attached a smaller patch which does not include the hunks related to redundant bounds checking and will move those to CLJS-757. This is the smallest patch which will pass all the tests related to using non-numeric indexes with indexed types.

Comment by David Nolen [ 29/Jan/14 8:26 PM ]

-assoc and -assoc! do not need fixing. Only -assoc-n and -assoc-n! do. The patch needs to be far more minimal if it hopes to get accepted.

Comment by Francis Avila [ 30/Jan/14 11:46 AM ]

It doesn't make sense for indexed types to have -lookup call -nth but -assoc-n call -assoc (as PersistentVector and Subvec do, but not TransientVector). This means if we know we are using a numeric key we call -nth on read, but call -assoc on writes instead of -assoc-n. -assoc-n is what has the required "key is a number" semantic, not -assoc, and this leads to an unnecessary number? check being done in the implementation of Subvec -conj. (There are also other places, e.g. Subvec -assoc, where -assoc-n could be called instead of assoc or -assoc, but I don't address those.)

In other words, it would be best if -assoc-n/-assoc-n! consistently meant "numeric key" 1) so that callers could avoid a number? check if they know their key is numeric, 2) so there is a sane analogy -lookup:-nth :: -assoc:-assoc-n :: -assoc!:-assoc-n!

Comment by Francis Avila [ 30/Jan/14 11:49 AM ]

New patch cljs-728-more-minimal.patch

Hunk-by-hunk justification of changes:

  1. core_test.cljs
    1. test (get x n) (get x n nf) (nth x n) (nth x n nf) and (assoc x n v) with non-numeric n against PersistentVector, Subvec, TransientVector, and Range types.
  2. core.cljs
    1. number? check in nth
    2. number? check in PersistentVector -lookup
    3. number? check in PersistentVector -assoc
    4. number? check in Subvec -lookup
    5. number? check in Subvec -assoc
    6. number? check in TransientVector -assoc!
    7. number? check in TransientVector -lookup

Although it does not swap -assoc-n and -assoc on Subvec and PersistentVector, I would hardly describe this patch as
"far more minimal" than the previous one--I hope that does not mean it is unacceptable. The only way I can see to
make this patch smaller is to file a different ticket for each hunk and corresponding test, which seems crazy.

If the patch is still not acceptable, could you please be much more specific about what exactly is wrong with it so we can close this bug.

Comment by David Nolen [ 30/Jan/14 12:03 PM ]

That the ClojureScript vector types implement assoc in terms of assoc-n and not the other way around should be considered a flaw. Please refer to Clojure's PersistentVector.java, RT.java for a guide about how the logic should be implemented. I'll take a patch that aligns ClojureScript's logic closer to Clojure. Thanks.

Comment by Francis Avila [ 17/Feb/14 11:54 AM ]

Patch that aligns ClojureScript vector type's -assoc-n/-assoc with Clojure's is in CLJS-767, and patch that fixes assoc! on TransientVectors is in CLJS-768.

I will cut conflicting hunks out of this CLJS-728 patch (hunks 2.3, 2.5 and 2.6 as outlined above) so all can apply cleanly with one another. (I'll have to do it this evening though.)

Full list of related tickets and patches (stuff that was factored out of the initial patches):

  1. CLJS-757: remove redundant bounds checking in indexed types (will need a rebase once the others are in).
  2. CLJS-767: Fix assoc on PersistentVector and Subvec
  3. CLJS-768: Fix assoc! on TransientVector
Comment by David Nolen [ 17/Feb/14 12:00 PM ]

Francis Avila, many thanks will look over these.

Comment by Francis Avila [ 17/Feb/14 10:49 PM ]

Patch rebased and trimmed: cljs-728-only-nth-lookup.patch.

Comment by Francis Avila [ 24/Feb/14 12:41 AM ]

Rebased cljs-728-only-nth-lookup.patch again. (core_test.cljs changes would have "conflicted".)

Comment by David Nolen [ 24/Feb/14 6:04 AM ]

fixed https://github.com/clojure/clojurescript/commit/71f781c75bf6e9d19be8d0e529ed0275fa523942

Generated at Wed Jul 23 06:07:06 CDT 2014 using JIRA 4.4#649-r158309.