<< Back to previous view

[CLJ-1386] Add transient? predicate Created: 17/Mar/14  Updated: 23/Mar/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Devin Walters Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections, transient
Environment:

N/A


Attachments: Text File 0001-Add-transient-predicate.patch     Text File 0002-Add-transient-predicate.patch    
Patch: Code and Test
Approval: Triaged

 Description   

I've encountered situations where I wanted to check whether something was transient in order to know whether I should call assoc! or assoc, conj! or conj, etc.

This patch adds `transient?` as a predicate fn.



 Comments   
Comment by Alex Miller [ 17/Mar/14 10:21 AM ]

Patch needs a docstring and a test.

Comment by Devin Walters [ 17/Mar/14 4:42 PM ]

Alex: I figured that would be the case! Sorry about that. I've updated the patch. It now includes a docstring and has tests of `transient?` for #{}, [], and {}.

Thanks!

Comment by Alex Miller [ 17/Mar/14 9:48 PM ]

Thanks - please don't use the labels "patch" or "test" - those are covered by the Patch field.

Comment by Devin Walters [ 18/Mar/14 9:17 AM ]

Ah, sorry for the mixup Alex. I assumed you removed "patch" as a label the first time around to flag this ticket as still needing a vetted patch. My mistake.

Comment by Andy Fingerhut [ 21/Mar/14 1:42 PM ]

Patch 0001-Add-transient-predicate.patch dated Mar 17, 2014 applies cleanly to latest Clojure master, but fails a test because the new function transient? has no :added metadata. See most other Clojure functions in clojure.core for examples.

It also generates a new warning while running tests:

WARNING: transient? already refers to: #'clojure.core/transient? in namespace: clojure.test-clojure.data-structures, being replaced by: #'clojure.test-clojure.data-structures/transient?

There is an older (but equivalent) definition of transient? in test file data_structures.clj that should be removed when adding it to clojure.core

Comment by Devin Walters [ 22/Mar/14 11:29 PM ]

@Andy, the reason I did not add :added metadata is because I do not know if/when this patch will be accepted, and as a result, I don't really know if it will sneak into 1.6.X or 1.7. For now, I've put it in as 1.7. If it's in the running to be added sooner than that, let me know and I'll adjust it.

RE: The warning. Good catch. I've submitted a new patch which removes the private version of transient? from data_structures.clj. All tests pass.

Edit to Add: The latest patch as of this comment is now 0002-Add-transient-predicate.patch.





[CLJ-1385] Docstrings for `conj!` and `assoc!` should suggest using the return value; effect not always in-place Created: 16/Mar/14  Updated: 06/Apr/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Pyry Jahkola Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: collections, docstring

Attachments: Text File CLJ-1385-reword-docstrings-on-transient-update-funct.patch    

 Description   

The docstrings of both `assoc!` and `conj!` say "Returns coll.", suggesting the transient edit happens always in-place, `coll` being the first argument.

However, the fact that the following example omits the key `8` in its result proves that in-place edits aren't always the case:

(let [a (transient {})]
      (dotimes [x 9]
        (assoc! a x :ok))
      (persistent! a))
    ;;=> {0 :ok, 1 :ok, 2 :ok, 3 :ok, 4 :ok, 5 :ok, 6 :ok, 7 :ok}

Instead, programmers should be guided towards using constructs like `reduce` with transients:

(persistent! (reduce #(assoc! %1 %2 :ok)
                 (transient {})
                 (range 9)))
    ;;=> {0 :ok, 1 :ok, 2 :ok, 3 :ok, 4 :ok, 5 :ok, 6 :ok, 7 :ok, 8 :ok}

The easiest way to achieve this is by changing the docstrings of (at least) `conj!` and `assoc!` to not read "Returns coll." but instead tell that the change is destructive.



 Comments   
Comment by Alex Miller [ 16/Mar/14 8:49 AM ]

When modifying transient collections, it is required to use the collection returned from functions like assoc!. The ! here indicates its destructive nature. The transients page (http://clojure.org/transients) describes the calling pattern pretty explicitly: "You must capture and use the return value in the next call."

I do not agree that we should be guiding programmers away from using functions like assoc! – transients are used as a performance optimization and using assoc! or conj! in a loop is often the fastest version of that. However I do think it would be helpful to make the docstring more explicit.

Comment by Gary Fredericks [ 05/Apr/14 10:23 AM ]

Alex I think you must have misread the ticket – the OP is suggesting guiding toward using the return value of assoc!, not avoiding assoc! altogether.

And the docstring is not simply inexplicit, it's actually incorrect specifically in the case that the OP pointed out. conj! and assoc do not return coll at the point where array-maps transition to hash-maps, and the fact that they do otherwise is supposed to be an implementation detail as far as I understand it.

Comment by Alex Miller [ 05/Apr/14 11:55 AM ]

@Gary - you're right, I did misread that.

assoc and conj both explicitly say "return a new collection" whereas assoc! and conj! say "Returns coll." I read that as "returns the modified collection" without regard to whether it's the identical instance, but I can read it your way too.

Would saying "Returns updated collection." transmit the right idea? Using "collection" instead of "coll" removes the concrete tie to the variable and "updated" hints more strongly that you should use the return value.

Comment by Pyry Jahkola [ 05/Apr/14 12:47 PM ]

@Alex, that update makes it sound right to me, FWIW.

Comment by Gary Fredericks [ 05/Apr/14 2:37 PM ]

Yeah, I think that's better. Thanks Alex. I'd be happy to submit a patch for that but I'm assuming patches are too heavy for this kind of change?

Comment by Andy Fingerhut [ 06/Apr/14 3:35 PM ]

Patches are exactly what has been done in the past for this kind of change, if it is in a doc string and not on the clojure.org web page.

Comment by Alex Miller [ 06/Apr/14 4:13 PM ]

Yup, patch desired.

Comment by Gary Fredericks [ 06/Apr/14 5:32 PM ]

Glad I asked.

Patch is attached that also updates the docstring for pop! which had the same issue, though arguably it's less important since afaik pop! does always return the identical collection (but I don't think this is part of the contract).





[CLJ-1375] Remove Util.pcequiv() and stop pretending Java colls are equiv to Clojure colls Created: 11/Mar/14  Updated: 11/Mar/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections

Attachments: Text File clj-1375-v1.patch    
Patch: Code

 Description   

The Util.pcequiv() method

static public boolean pcequiv(Object k1, Object k2){
	if(k1 instanceof IPersistentCollection)
		return ((IPersistentCollection)k1).equiv(k2);
	return ((IPersistentCollection)k2).equiv(k1);
}

tries to get equiv semantics (cross-class number equality) for cases of mixed Clojure/Java collection comparison. However, this is not a sustainable direction and we would like to stop doing this.

Attached patch removes this and changes calling code to only call equiv when both collections are IPersistentCollection.






[CLJ-1373] LazySeq should utilize cached hash from its underlying seq. Created: 09/Mar/14  Updated: 20/Mar/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Jozef Wagner Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections, performance
Environment:

1.6.0 master SNAPSHOT


Attachments: File clj-1373.diff    
Patch: Code
Approval: Triaged

 Description   

Even if underlying seq contains a cached hash, LazySeq computes it every time.

user=> (def a (range 100000))
#'user/a
user=> (time (hash a))
"Elapsed time: 46.904273 msecs"
375952610
user=> (time (hash a)) ;; RECOMPUTE
"Elapsed time: 10.879098 msecs"
375952610
user=> (def b (seq a))
#'user/b
user=> (time (hash b))
"Elapsed time: 10.572522 msecs"
375952610
user=> (time (hash b)) ;; CACHED HASH
"Elapsed time: 0.024927 msecs"
375952610
user=> (def c (lazy-seq b))
#'user/c
user=> (time (hash c))
"Elapsed time: 12.207651 msecs"
375952610
user=> (time (hash c))
"Elapsed time: 10.995798 msecs"
375952610


 Comments   
Comment by Jozef Wagner [ 09/Mar/14 9:20 AM ]

Added patch which checks if underlying seq implements IHashEq and if yes, uses that hash instead of recomputing.





[CLJ-1372] Inconsistent hash with java collections Created: 09/Mar/14  Updated: 20/Mar/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Jozef Wagner Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: collections
Environment:

1.6.0 master


Attachments: File clj-1372-2.diff     File clj-1372.diff    
Patch: Code and Test

 Description   

c.c/hash always use hashCode for java collections, which is incompatible when comparing with Clojure collections, which use Murmur3.

user=> (== (hash (java.util.ArrayList. [1 2 3])) (hash [1 2 3]))
false
user=> (= (java.util.ArrayList. [1 2 3]) [1 2 3])
true

One way to fix it is to add a special case in Util/hasheq for java.util.Collections, as it is now for Strings.



 Comments   
Comment by Jozef Wagner [ 09/Mar/14 8:02 AM ]

Same problem for maps, so hasheq should have a special case for java.util.Map too.

Comment by Jozef Wagner [ 09/Mar/14 9:21 AM ]

Added patch with fix for j.u. Map, Set and List.

Comment by Andy Fingerhut [ 09/Mar/14 6:02 PM ]

Add patch clj-1372-2.diff that is identical to Jozef Wagner's clj-1372.diff, except it also adds some new tests that fail without his changes, and pass with them.

Comment by Alex Miller [ 10/Mar/14 9:31 AM ]

I think the contract on equiv/hasheq is more narrowly scoped than this and only applies if both collections are IPersistentCollection. In other words, I don't think this is wanted or required.

Note that the Java .equals/.hashCode contract is maintained here - these collections will compare as .equals() and do have the same .hashCode().

Comment by Jozef Wagner [ 10/Mar/14 9:38 AM ]

Without the patch the following statement is not valid: "If two objects are equal with c.c/=, than their hash returned by c.c/hash is the same number". We can say that this is valid only iff both objects are 'clojure' objects, but this goes against clojures interop principles (interop is easy, fast, no surprises).

Comment by Jozef Wagner [ 10/Mar/14 9:54 AM ]

Manifestation of this bug

user=> (assoc (hash-map [1 2 3] :foo) (java.util.ArrayList. [1 2 3]) :bar)
{[1 2 3] :bar, [1 2 3] :foo}
user=> (get (hash-map [1 2 3] :foo) (java.util.ArrayList. [1 2 3]))
nil
Comment by Alex Miller [ 10/Mar/14 10:58 AM ]

I agree that would be a nice thing to say without qualification.

There is a real cost to adding more branches in hasheq - adding those collection checks affects every hasheq. Running a full Clojure build, I see the following set of classes with >100 occurences where this happens (note that exactly 0 of these are the Java collections - this case doesn't exist in the Clojure build itself):

clojure.lang.Var 107001502
java.lang.Class 2651389
java.lang.Character 2076322 
java.util.UUID 435235 
java.util.Date 430956
clojure.lang.Compiler$LocalBinding 116830
java.lang.Boolean 112361
java.util.regex.Pattern 325

We'd be adding 4 more instanceof checks in the path of every one of those hasheqs. This would also likely blow any JVM inlining.

Rich says "all bets should be off for hasheq/equiv of non-values" where Java collections obviously fall into the class of "non-values".

Comment by Andy Fingerhut [ 10/Mar/14 11:04 AM ]

Would a doc patch be considered? Say one that modified the doc of clojure.core/hash to include a phrase indicating that it is only promised to be consistent with clojure.core/= for immutable values? It could even perhaps mention that Floats are out, too: see CLJ-1036

Comment by Alex Miller [ 10/Mar/14 12:00 PM ]

I think it would be preferred to do any detailed docs about hash at http://clojure.org/data_structures rather than in the docstring. Although the docstring on hash probably could use an update and a pointer to the web site after the latest changes.

Comment by Jozef Wagner [ 10/Mar/14 12:14 PM ]

Neverthless it is a breaking change from 1.5, and it should be mentioned in changelog. What still bugs me is that c.c/= is supported in such cases but the c.c/hash is not. If supporting c.c/hash is expensive, isn't it better to drop support for c.c/= in such cases? It will eliminate surprises such as:

user=> (apply distinct? (hash-set [1 2 3] (java.util.Collections/unmodifiableList [1 2 3])))
false
Comment by Alex Miller [ 10/Mar/14 2:05 PM ]

I'm not sure it's a "breaking" change if something not considered to be guaranteed changes. But I take your point.

I don't think it's feasible to drop = support for Clojure and Java collections - that seems important and useful. And if it were free to do so, I would like to be able to say without qualification that if equiv=true, then hasheq is the same.

It's unclear to me that the examples listed on this ticket are actually real problems people are likely to encounter. The main users of hasheq are hash map and hash set. So to manifest, you would need to be putting a mixture of Clojure and Java collections into one of those, in particular a mixture of collections that compare as equal.

Still thinking about it.

Comment by Jozef Wagner [ 10/Mar/14 3:27 PM ]

Sorry for spamming but there may be another option, to not fallback into hashCode in hasheq, but to instead throw in cases where hasheq is requested for non-values. This will lead to a cleaner separation of hash types. Of course it will prevent putting non-values into hash-set.

Comment by Alex Miller [ 10/Mar/14 3:34 PM ]

There is no simple check for "valueness" though?

Comment by Andy Fingerhut [ 10/Mar/14 3:37 PM ]

An idea, for what it might be worth: Add one test for instance of java.util.Collection in Util.hasheq method instead of 3 separate tests for Set, List, and Map. It doesn't cover Map.Entry.

Comment by Alex Miller [ 10/Mar/14 3:38 PM ]

Map doesn't extend Collection either.

Comment by Stuart Halloway [ 11/Mar/14 10:44 AM ]

I think this needs more consideration and should not hold up 1.6.

Comment by Andy Fingerhut [ 20/Mar/14 2:01 PM ]

Both patches clj-1372.diff and clj-1372-2.diff fail to apply cleanly as of latest Clojure master on Mar 20 2014. They did apply cleanly before the Mar 19 2014 commit, I believe, and the only issue appears to be a changed line of diff context. Given the discussion about whether such a change is desired, it sounds like more thought is needed before deciding what change should be made, if any.





[CLJ-1365] New collection hash functions are too slow Created: 20/Feb/14  Updated: 11/Mar/14  Resolved: 11/Mar/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.6

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: collections

Attachments: Text File clj-1365-v1.patch     Text File clj-1365-v2.patch     Zip Archive testclj1365.zip    
Patch: Code
Approval: Ok

 Description   

As reported ( https://groups.google.com/d/msg/clojure-dev/t6LAmVe-RLM/ekLTKxYfU5UJ ) by Mark Engelberg, the new collection hashing functions are slower than invoking the Murmur3 functions directly. See the attached zip for performance tests.

Approach: Made mix-collection-hash, hash-ordered-coll, and hash-unordered-coll use primitive type hints to avoid the bulk of the time.

Patch: clj-1365-v2.patch

Screened by:



 Comments   
Comment by Alex Miller [ 20/Feb/14 11:26 AM ]

Added to 1.6 release.

Comment by Alex Miller [ 20/Feb/14 12:40 PM ]

Made hash functions inline for performance.

Comment by Rich Hickey [ 20/Feb/14 7:55 PM ]

Reported where?

This looks like bad benchmarking.

(dotimes [_ 10] (let [x 1 y 1] (time (dotimes [n 1000000000] (clojure.lang.Murmur3/mixCollHash x y)))))

and

(dotimes [_ 10] (let [x 1 y 1] (time (dotimes [n 1000000000] #_(clojure.lang.Murmur3/mixCollHash x y)))))

take the same time on my machine.

I'd need to see tests where the return was definitely used, it seems this is just more easily ignored by hotspot when not used.

We probably only need to hint count and the return for decent results.

Comment by Alex Miller [ 20/Feb/14 8:55 PM ]

It was reported by Mark Engelberg in his Instaparse rework - he observed these calls taking noticeably longer and overall times 10-20% down. I will ask him to chime in here.

Comment by Rich Hickey [ 04/Mar/14 8:44 AM ]

Could someone please test hinting hint count and the return? I'd hate for the answer to anyone's perf issues be inlining.

Comment by Alex Miller [ 04/Mar/14 9:06 AM ]

I will provide some more data for consideration of the options.

Comment by Alex Miller [ 04/Mar/14 11:07 AM ]

Test project for different variants

Comment by Alex Miller [ 04/Mar/14 11:11 AM ]

Attached a test project with different variants for testing and better benchmarking. To run:

unzip testclj1365.zip
cd clj1365
lein uberjar
java -server -cp target/clj1365-0.1.0-SNAPSHOT-standalone.jar clj1365.core

Results:

mix-collection-hash original
"Elapsed time: 57.777 msecs"
"Elapsed time: 18.034 msecs"
"Elapsed time: 20.591 msecs"
"Elapsed time: 25.179 msecs"
"Elapsed time: 21.781 msecs"
mix-collection-hash hints
"Elapsed time: 14.983 msecs"
"Elapsed time: 8.871 msecs"
"Elapsed time: 8.793 msecs"
"Elapsed time: 8.92 msecs"
"Elapsed time: 8.873 msecs"
mix-collection-hash inline
"Elapsed time: 10.04 msecs"
"Elapsed time: 7.117 msecs"
"Elapsed time: 7.306 msecs"
"Elapsed time: 7.324 msecs"
"Elapsed time: 7.175 msecs"
Murmur3/mixCollHash
"Elapsed time: 9.522 msecs"
"Elapsed time: 7.288 msecs"
"Elapsed time: 7.397 msecs"
"Elapsed time: 7.364 msecs"
"Elapsed time: 7.345 msecs"

From these results, I infer that the unhinted version is slower (21 ms) than a static call (7 ms). Inlining gives you same perf as static. Hinting inputs and return gives almost the same perf (9 ms).





[CLJ-1364] Primitive VecSeq does not implement equals or hashing methods Created: 19/Feb/14  Updated: 19/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections


 Description   

VecSeq (as produced by (seq (vector-of :int 1 2 3))) does not implements equals, hashCode, or hasheq and does not play with any other Clojure collections or sequences appropriately in this regard.

user=> (def rs (range 3))
user=> (def vs (seq (vector-of :int 0 1 2)))
user=> rs
(0 1 2)
user=> vs
(0 1 2)
user=> (.equals rs vs)
true
user=> (.equals vs rs)    ;; expect: true
false
user=> (.equiv rs vs)
true
user=> (.equiv vs rs)
true
user=> (.hashCode rs)
29824
user=> (.hashCode vs)     ;; expect to match (.hashCode rs)
2081327893
user=> (System/identityHashCode vs)  ;; show that we're just getting Object hashCode
2081327893
user=> (.hasheq rs)
29824
user=> (.hasheq vs)       ;; expect same as (.hasheq rs) but not implemented at all
IllegalArgumentException No matching field found: hasheq for class clojure.core.VecSeq  clojure.lang.Reflector.getInstanceField (Reflector.java:271)





[CLJ-1362] Reduce broken on some primitive vectors Created: 18/Feb/14  Updated: 18/Apr/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.3, Release 1.4, Release 1.5, Release 1.6
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Nathan Davis Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections

Attachments: Text File clj-1362-v1.patch    
Patch: Code and Test
Approval: Vetted

 Description   

In some cases, reduce over a sequence from a primitive vector created with vector-of will return incorrect answers:

user=> (into [] (drop 32 (into [] (range 33))))
[32]
user=> (into [] (drop 32 (into (vector-of :int) (range 33))))
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32]

Second call should return [32] just like the first one.

Cause: VecSeq (seq on primitive Vec obtained with vector-of) maintains two flags: i is the total number of elements prior to the current node in this seq. offset is the offset in the current anode. When using internal-reduce on a VecSeq, the starting index for the reduce was using offset and ignoring i.

Solution: Use (+ i offset) as the starting index.

Patch: clj-1362-v1.patch

Screened by:



 Comments   
Comment by Alex Miller [ 18/Feb/14 10:18 PM ]

We did some debugging on this at the St. Louis Clojure Meetup tonight and suspect the problem is happening when drop walks through the chunked seq over the vector. Specifically, in the VecSeq's implementation of IChunkedSeq.chunkedNext() at https://github.com/clojure/clojure/blob/master/src/clj/clojure/gvec.clj#L116 particularly the offset 0 at the end.

Comment by Alex Miller [ 19/Feb/14 2:41 PM ]

Upon further review, the VecSeq seems to be created properly during chunking. The real issue is in internal-reduce where the starting index is improperly computed.





[CLJ-1329] Unused local variable in PersistentVector.cons() Created: 22/Jan/14  Updated: 02/Mar/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5, Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Trivial
Reporter: Smit Shah Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections

Attachments: Text File fix.patch    
Patch: Code
Approval: Triaged

 Description   

in src/jvm/clojure/lang/PersistentVector.java:168, there is an integer i being defined which is not being used anywhere in the method.

https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java#L168



 Comments   
Comment by Stuart Halloway [ 31/Jan/14 6:14 PM ]

Smit, can you please submit a CA? http://clojure.org/contributing

Comment by Smit Shah [ 02/Feb/14 1:16 PM ]

Stuart, I will send the CA via post ASAP.
It might take a couple of days to reach Rich though.

Comment by Smit Shah [ 01/Mar/14 11:51 AM ]

Stuart, I have successfully submitted the CA (http://clojure.org/contributing).
I guess now merging this patch shouldn't be a problem

Comment by Alex Miller [ 02/Mar/14 11:37 AM ]

Thanks Smit!





[CLJ-1082] Subvecs of primitive vectors cannot be reduced Created: 05/Oct/12  Updated: 11/Jan/14  Resolved: 11/Jan/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.6

Type: Defect Priority: Major
Reporter: Ghadi Shayban Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: collections
Environment:

1.7.0-08, OS X 10.8


Attachments: Text File clj-1082.patch     File clj-1082-patch-v2.diff     Text File clj-1082-patch-v2.txt     File clj-1082-patch-v3.diff    
Patch: Code and Test
Approval: Ok

 Description   

Reduce doesn't work on subvecs of primitive vectors.

(let [prim-vec (into (vector-of :long) (range 10000))]
  (reduce + (subvec prim-vec 1 500)))

->> ClassCastException clojure.core.Vec cannot be cast to clojure.lang.PersistentVector  clojure.lang.APersistentVector$SubVector.iterator (APersistentVector.java:523)

If reduce on a subvec doesn't work then neither will nice ops like fold.

Cause: RT.subvec() creates an APersistentVector$SubVector. SubVector.iterator() assumes the source vector is a PersistentVector, however a primitive vector is a Vec (which is not a PersistentVector). This causes a class cast exception as observed on any attempt to iterate over the subvector.

Approach:
1. Provide a generic ranged iterator for APersistentVector, that can be used by SubVector
2. Make the iterator() method for APersistentVector$SubVector use this new iterator where possible (i.e. whenever the source vector is an APersistentVector). If not, the generic super.iterator() method is used (which is slower, but safe for any source vector that implements IPersistentVector)

Patch: clj-1082-patch-v3.diff

Screened by: Alex Miller



 Comments   
Comment by Timothy Baldridge [ 27/Nov/12 11:52 AM ]

Confirmed to be broken on master. Vetting. Not sure what it's going to take to fix this, however. If this is of intrest for you, you might want to help push it along by providing a patch.

Comment by Andy Fingerhut [ 27/Nov/12 12:09 PM ]

There is no code or ticket for this yet, but I wanted to mention that I've begun working on an implementation of RRB-Tree (Relaxed Radix Balanced Tree) vectors for Clojure (see discussion at https://groups.google.com/forum/?fromgroups=#!topic/clojure-dev/xnbtzTVEK9A). Assuming it is no big deal to get reducers to work on such vectors, subvec could be implemented in O(log n) time in such a way that the result was the same concrete type of vector as you started with, and thus reducers would work on them, too.

It would mean O(log n) time for subvec instead of today's O(1), but this would likely fix other limitations that exist today with subvec's, e.g. CLJ-761.

Comment by Mike Anderson [ 20/Jan/13 5:14 AM ]

I have a fix for this and a regression test in the tree below:

https://github.com/mikera/clojure/tree/winfix

Not sure how best to make this into a patch though - I also have the pprint fix in here (CLJ-1076)

Comment by Mike Anderson [ 20/Jan/13 6:41 PM ]

Attached a patch that I created with:

git format-patch winfix --stdout HEAD~3..HEAD > clj-1082.patch

Does this do the trick? I had to use the HEAD~3..HEAD to just get the most recent 3 commits and exclude the pprint changes that I needed in order to build on Windows.

Comment by Mike Anderson [ 20/Jan/13 6:42 PM ]

Patch for CLJ-1082, containing 3 commits

Comment by Andy Fingerhut [ 21/Jan/13 1:11 AM ]

Mike, your patch clj-1082.patch applies cleanly to latest master for me, so looks like you found one way to do it.

Another would be as follows, and closer to the directions on the JIRA workflow page: http://dev.clojure.org/display/design/JIRA+workflow (but not identical). Note that these commands would work on Mac OS X or Linux. I'm not sure what the correct corresponding command would be on Windows for the "git am" step below, unless that just happens to work because Windows and/or git implement the input redirection with "<" somehow.

  1. Check out a fresh repo
    $ git clone git://github.com/clojure/clojure.git
    $ cd clojure
  1. Apply the patch for CLJ-1076 to the master branch. This step isn't on the JIRA workflow page.
    $ git am --keep-cr -s < clj-1076-fix-tests-on-windows-patch-v1.txt
  1. Create a branch for yourself
    $ git checkout -b fix-clj-1082
  1. After editing to make your changes, commit them to the current fix-clj-1082 branch
    $ git commit -a -m "fixed annoying bug, refs #42"

From there on down it is the same as the instructions on the JIRA workflow page. The "git format-patch master --stdout > file.patch" will create a patch for the changes you have made in the current branch fix-clj-1082 starting from the master branch, which has the CLJ-1076 fix because of the 'git am' command above.

Comment by Alex Miller [ 22/Aug/13 10:36 PM ]

Moving back to Triaged as Rich has not vetted.

Comment by Andy Fingerhut [ 05/Sep/13 6:12 PM ]

clj-1082-patch-v2.txt is identical to Mike Anderson's clj-1082.patch, preserving his authorship, except it eliminates a carriage return added at the end of one line, which also causes git to issue a warning when applying the patch.

Comment by Rich Hickey [ 22/Nov/13 7:24 AM ]

This diff remains inscrutable. It seems to have two patches in it, one a first idea and another a modification to that? Patches should be direct enhancements to the trunk code. Also, what is endIndex for and why is it mutable? Why not just use end? And, the code doesn't agree with the plan, which says "Check the vector type and if it is an APersistentVector, use the existing logic. Otherwise, fallback to a new rangedIterator() implementation in APersistentVector that iterates using nth." while the code seems to do the opposite:

+ if (v instanceof APersistentVector) { + return ((APersistentVector)v).rangedIterator(start,end); + }
+ return super.iterator();

Comment by Mike Anderson [ 22/Nov/13 11:00 AM ]

Hi Rich,
1. As per comments, Andy made a small change to the original patch. v2 supersedes the original patch.
2. endIndex is part of the iterator implementation: I believe this must be mutable to provide the required Java Iterator behaviour
3. I think the approach is misworded (it was added long after the patch), I shall try to improve this.

Comment by Mike Anderson [ 22/Nov/13 12:32 PM ]

I don't seem to have the ability to edit the description. Here's what I think it should say:

Cause: RT.subvec() creates an APersistentVector$SubVector. SubVector.iterator() assumes the source vector is a PersistentVector, however a primitive vector is a Vec (which is not a PersistentVector). This causes a class cast exception as observed on any attempt to iterate over the subvector.

Approach:
1. Provide a generic ranged iterator for APersistentVector, that can be used by SubVector
2. Make the iterator() method for APersistentVector$SubVector use this new iterator where possible (i.e. whenever the source vector is an APersistentVector). If not, the generic super.iterator() method is used (which is slower, but safe for any source vector that implements IPersistentVector)

Comment by Alex Miller [ 22/Nov/13 12:58 PM ]

Updated description per Mike's comment.

Comment by Alex Miller [ 24/Nov/13 2:15 PM ]

Added new v3 patch that a) combines the previous commits into a single patch and b) removes endIndex and uses end instead. As far as I know this + the description change address all of Rich's questions. Marking re-screened.





[CLJ-995] sorted-set doesn't support IEditableCollection Created: 13/May/12  Updated: 05/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Moritz Ulrich Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections


 Description   

I think sorted-set (PersistentTreeSet) should implement the transient interface. It's a special-purpose set and should be usable just like every normal set.



 Comments   
Comment by Michel Alexandre Salim [ 04/Jun/12 2:32 AM ]

Note that this would require PersistentTreeMap to implement IEditableCollection as well.





Generated at Fri Apr 18 16:15:54 CDT 2014 using JIRA 4.4#649-r158309.