<< Back to previous view

[CLJ-1082] Subvecs of primitive vectors cannot be reduced Created: 05/Oct/12  Updated: 11/Jan/14  Resolved: 11/Jan/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.6

Type: Defect Priority: Major
Reporter: Ghadi Shayban Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: collections
Environment:

1.7.0-08, OS X 10.8


Attachments: Text File clj-1082.patch     File clj-1082-patch-v2.diff     Text File clj-1082-patch-v2.txt     File clj-1082-patch-v3.diff    
Patch: Code and Test
Approval: Ok

 Description   

Reduce doesn't work on subvecs of primitive vectors.

(let [prim-vec (into (vector-of :long) (range 10000))]
  (reduce + (subvec prim-vec 1 500)))

->> ClassCastException clojure.core.Vec cannot be cast to clojure.lang.PersistentVector  clojure.lang.APersistentVector$SubVector.iterator (APersistentVector.java:523)

If reduce on a subvec doesn't work then neither will nice ops like fold.

Cause: RT.subvec() creates an APersistentVector$SubVector. SubVector.iterator() assumes the source vector is a PersistentVector, however a primitive vector is a Vec (which is not a PersistentVector). This causes a class cast exception as observed on any attempt to iterate over the subvector.

Approach:
1. Provide a generic ranged iterator for APersistentVector, that can be used by SubVector
2. Make the iterator() method for APersistentVector$SubVector use this new iterator where possible (i.e. whenever the source vector is an APersistentVector). If not, the generic super.iterator() method is used (which is slower, but safe for any source vector that implements IPersistentVector)

Patch: clj-1082-patch-v3.diff

Screened by: Alex Miller



 Comments   
Comment by Timothy Baldridge [ 27/Nov/12 11:52 AM ]

Confirmed to be broken on master. Vetting. Not sure what it's going to take to fix this, however. If this is of intrest for you, you might want to help push it along by providing a patch.

Comment by Andy Fingerhut [ 27/Nov/12 12:09 PM ]

There is no code or ticket for this yet, but I wanted to mention that I've begun working on an implementation of RRB-Tree (Relaxed Radix Balanced Tree) vectors for Clojure (see discussion at https://groups.google.com/forum/?fromgroups=#!topic/clojure-dev/xnbtzTVEK9A). Assuming it is no big deal to get reducers to work on such vectors, subvec could be implemented in O(log n) time in such a way that the result was the same concrete type of vector as you started with, and thus reducers would work on them, too.

It would mean O(log n) time for subvec instead of today's O(1), but this would likely fix other limitations that exist today with subvec's, e.g. CLJ-761.

Comment by Mike Anderson [ 20/Jan/13 5:14 AM ]

I have a fix for this and a regression test in the tree below:

https://github.com/mikera/clojure/tree/winfix

Not sure how best to make this into a patch though - I also have the pprint fix in here (CLJ-1076)

Comment by Mike Anderson [ 20/Jan/13 6:41 PM ]

Attached a patch that I created with:

git format-patch winfix --stdout HEAD~3..HEAD > clj-1082.patch

Does this do the trick? I had to use the HEAD~3..HEAD to just get the most recent 3 commits and exclude the pprint changes that I needed in order to build on Windows.

Comment by Mike Anderson [ 20/Jan/13 6:42 PM ]

Patch for CLJ-1082, containing 3 commits

Comment by Andy Fingerhut [ 21/Jan/13 1:11 AM ]

Mike, your patch clj-1082.patch applies cleanly to latest master for me, so looks like you found one way to do it.

Another would be as follows, and closer to the directions on the JIRA workflow page: http://dev.clojure.org/display/design/JIRA+workflow (but not identical). Note that these commands would work on Mac OS X or Linux. I'm not sure what the correct corresponding command would be on Windows for the "git am" step below, unless that just happens to work because Windows and/or git implement the input redirection with "<" somehow.

  1. Check out a fresh repo
    $ git clone git://github.com/clojure/clojure.git
    $ cd clojure
  1. Apply the patch for CLJ-1076 to the master branch. This step isn't on the JIRA workflow page.
    $ git am --keep-cr -s < clj-1076-fix-tests-on-windows-patch-v1.txt
  1. Create a branch for yourself
    $ git checkout -b fix-clj-1082
  1. After editing to make your changes, commit them to the current fix-clj-1082 branch
    $ git commit -a -m "fixed annoying bug, refs #42"

From there on down it is the same as the instructions on the JIRA workflow page. The "git format-patch master --stdout > file.patch" will create a patch for the changes you have made in the current branch fix-clj-1082 starting from the master branch, which has the CLJ-1076 fix because of the 'git am' command above.

Comment by Alex Miller [ 22/Aug/13 10:36 PM ]

Moving back to Triaged as Rich has not vetted.

Comment by Andy Fingerhut [ 05/Sep/13 6:12 PM ]

clj-1082-patch-v2.txt is identical to Mike Anderson's clj-1082.patch, preserving his authorship, except it eliminates a carriage return added at the end of one line, which also causes git to issue a warning when applying the patch.

Comment by Rich Hickey [ 22/Nov/13 7:24 AM ]

This diff remains inscrutable. It seems to have two patches in it, one a first idea and another a modification to that? Patches should be direct enhancements to the trunk code. Also, what is endIndex for and why is it mutable? Why not just use end? And, the code doesn't agree with the plan, which says "Check the vector type and if it is an APersistentVector, use the existing logic. Otherwise, fallback to a new rangedIterator() implementation in APersistentVector that iterates using nth." while the code seems to do the opposite:

+ if (v instanceof APersistentVector) { + return ((APersistentVector)v).rangedIterator(start,end); + }
+ return super.iterator();

Comment by Mike Anderson [ 22/Nov/13 11:00 AM ]

Hi Rich,
1. As per comments, Andy made a small change to the original patch. v2 supersedes the original patch.
2. endIndex is part of the iterator implementation: I believe this must be mutable to provide the required Java Iterator behaviour
3. I think the approach is misworded (it was added long after the patch), I shall try to improve this.

Comment by Mike Anderson [ 22/Nov/13 12:32 PM ]

I don't seem to have the ability to edit the description. Here's what I think it should say:

Cause: RT.subvec() creates an APersistentVector$SubVector. SubVector.iterator() assumes the source vector is a PersistentVector, however a primitive vector is a Vec (which is not a PersistentVector). This causes a class cast exception as observed on any attempt to iterate over the subvector.

Approach:
1. Provide a generic ranged iterator for APersistentVector, that can be used by SubVector
2. Make the iterator() method for APersistentVector$SubVector use this new iterator where possible (i.e. whenever the source vector is an APersistentVector). If not, the generic super.iterator() method is used (which is slower, but safe for any source vector that implements IPersistentVector)

Comment by Alex Miller [ 22/Nov/13 12:58 PM ]

Updated description per Mike's comment.

Comment by Alex Miller [ 24/Nov/13 2:15 PM ]

Added new v3 patch that a) combines the previous commits into a single patch and b) removes endIndex and uses end instead. As far as I know this + the description change address all of Rich's questions. Marking re-screened.

Generated at Wed Jul 23 19:26:08 CDT 2014 using JIRA 4.4#649-r158309.