<< Back to previous view

[CLJ-1424] Reader conditionals Created: 15/May/14  Updated: 27/Feb/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.7

Type: Enhancement Priority: Critical
Reporter: Ghadi Shayban Assignee: Alex Miller
Resolution: Unresolved Votes: 1
Labels: reader

Attachments: File clj-1424-10.diff     File CLJ-1424-2.diff     File clj-1424-3.diff     File clj-1424-4.diff     File clj-1424-5.diff     File clj-1424-6.diff     File clj-1424-7.diff     File clj-1424-8.diff     File clj-1424-9.diff     File clojure-feature-expressions.diff    
Patch: Code and Test
Approval: Vetted

 Description   

See http://dev.clojure.org/display/design/Reader+Conditionals for design.

The implementation details contain several major parts:

1) LispReader changes to implement new conditional read syntax and features.

In RT:

  • suppress-read - new dynamic var
  • readString() - new arity that takes opts

In LispReader:

  • read(...) entry point - added new variants that take opts. The opts-only form will decode :eof into the correct values for eofIsError and eofValue. Internal forms take both opts and pendingForms (list).
  • All reader invoke dispatch methods now take both opts and pendingForms which get drug around the call stack.
  • Defined all option keys and well-known allowed values - OPT_EOF, OPT_FEATURES, OPT_READ_COND, EOFTHROW, PLATFORM_KEY, PLATFORM_FEATURES, COND_ALLOW, COND_PRESERVE.
  • read(...) entry point will ensure installation of {:features #{:clj}}, appropriately merging others or none. There is no path that will pass :features inside Clojure but users can call LispReader with it directly.
  • readDelimitedList() was refactored so it can share logic with readCondDelimited().
  • Luke can provide more detail on the guts of reader conditional reading if needed.

New ReaderConditional and TaggedLiteral types.

In clojure.core:

  • tagged-literal?, tagged-literal, reader-conditional?, reader-conditional - new functions to support the preserved data form of these
  • new print-method impls for TaggedLiteral and ReaderConditional
  • read - added new arity that takes opts and explain opts in docstring
  • read-string - added new arity that takes opts (placement matches edn/read-string and is designed for partial application)

2) cljc file changes.
In Compiler:

  • OPTS_COND_ALLOWED - keep around a map {:read-cond :allow} to invoke reader when reading .cljc files.
  • readerOpts() - new private function to determine reader options to use based on file name (.clj vs .cljc)
  • load() - use readerOpts() and invoke LispReader with them
  • compile() - use readerOpts() and invoke LispReader with them

In RT:

  • load(...) - check first for .clj file, then for .cljc file when looking by ns

In clojure.main - update code to allow for .clj and .cljc files
In clojure.repl - update code to allow for .clj and .cljc files

3) Tests + infrastructure updates.

  • Move test/clojure/test_clojure/reader.clj to test/clojure/test_clojure/reader.cljc so that we can test reader conditionals.
  • Bump version of test.generative to get newer version that depends on newer tools.namespace which is able to find .cljc files as source files. The test script at src/script/run_test.clj uses this to find Clojure test files and without the bump, it will not find the new reader.cljc test ns. Note: if you use the ant build, you need to re-run antsetup.sh for this to take effect.
  • src/script/run_test.clj - as part of the version bump, move from deprecated namespace to new namespace
  • Added lots of tests in reader.cljc

Patch: clj-1424-10.diff

Screened by:

Related: CLJS-27, TRDR-14



 Comments   
Comment by Jozef Wagner [ 16/May/14 2:19 AM ]

Has there been a decision that CL syntax is going to be used? Related discussion can be found at design page, google groups discussion and another discussion.

Comment by Alex Miller [ 16/May/14 8:34 AM ]

No, no decisions on anything yet.

Comment by Ghadi Shayban [ 19/May/14 7:25 PM ]

Just to echo a comment from TRDR-14:

This is WIP and just one approach for feature expressions. There seem to be at least two couple diverging approaches emerging from the various discussion (Brandon Bloom's idea of read-time splicing being the other.)

In any case having all Clojure platforms be ready for the change is probably essential. Also backwards compatibility of feature expr code to Clojure 1.6 and below is also not trivial.

Comment by Kevin Downey [ 04/Aug/14 1:39 PM ]

if you have ever tried to do tooling for a language where the "parser" tossed out information or did some partial evaluation, it is a pain. this is basically what the #+cljs style feature expressions and bbloom's read time splicing both do with clojure's reader. I think resolving this at read time instead of having the compiler do it before macro expansion is a huge mistake and makes the reader much less useful for reading code.

Comment by Ghadi Shayban [ 04/Aug/14 2:00 PM ]

Kevin, what kind of tooling use case are you alluding to?

Comment by Kevin Downey [ 04/Aug/14 3:24 PM ]

any use case that involves reading code and not immediately handing it off to the compiler. if I wanted to write a little snippet to read in a function, add an unused argument to every arity then pprint it back, reader resolved feature expressions would not round trip.

if I want to write snippet of code to generate all the methods for a deftype (not a macro, just at the repl write a `for` expression) I can generate a clojure data structure, call pprint on it, then paste it in as code, reader feature expressions don't have a representation as data so I cannot do that, I would have to generate strings directly.

Comment by Alex Miller [ 22/Aug/14 9:10 AM ]

Changing Patch setting so this is not in Screenable yet (as it's still a wip).

Comment by Alex Miller [ 07/Nov/14 4:39 PM ]

Latest patch brings up to par with related patches in CLJS-27 and TRDR-14 and importantly adds support for loading .cljc files as Clojure files.

Comment by Andy Fingerhut [ 07/Nov/14 5:55 PM ]

Maybe undesirable behavior demonstrated below with latest Clojure master plus patch clj-1424-3.diff, due to the #+cljs skipping the comment, but not the (dec a). I thought it could be fixed simply by moving RT.suppressRead() check after (ret == r) check in read(), but that isn't correct.

user=> (read-string "(defn foo [a] #+clj (inc a) #+cljs (dec a))")
(defn foo [a] (inc a))
user=> (read-string "(defn foo [a] #+clj (inc a) #+cljs ; foo\n (dec a))")
(defn foo [a] (inc a) (dec a))
Comment by Alex Miller [ 21/Jan/15 4:28 PM ]

Added new clj-1424-4.diff which makes a couple of modifications:

  • removed support for and/or/not (#+ and #- remain)
  • *features* has been removed
  • if you wish to have a custom feature set while reading, there is a new option map that can be passed to read (this all parallels similar changes previously made to the edn reader)

Example of adding a "custom" feature to the feature set (which will always contain "clj" feature):

(read 
  {:features #{:custom}} 
  (java.io.PushbackReader. (java.io.StringReader. "[#+custom :x]")))
Comment by Andy Fingerhut [ 21/Jan/15 5:01 PM ]

Latest patch clj-1424-4.diff also exhibits maybe-undesirable behavior in which #+cljs can suppress an immediately following comment, rather than the form following it. See 07/Nov/14 comment with example above.

Comment by Alex Miller [ 21/Jan/15 6:16 PM ]

Thanks Andy, I'm aware. Haven't looked at it yet.

Comment by Luke VanderHart [ 25/Jan/15 9:26 PM ]

Patch clj-1424-5.diff modifies the code to use "read-conditionals", as outlined by Rich at: https://groups.google.com/d/msg/clojure-dev/LW0ocQ1RcYI/IBPPyfCpM3kJ

Comment by Alex Miller [ 26/Jan/15 12:33 PM ]

Some feedback:

1) Because pendingForms is an internal thing, I would make the read() that takes it non-public.
2) In readDelimitedList, I don't see the point of constructing a new LinkedList then checking if it's empty there. Should just make the add conditional on whether it's null or not.
3) You could treat pendingForms as a Deque (which LinkedList implements) and then use pop() instead of remove(0). The addAll(0, ...) is more painful to replicate though if you're sticking to Deque. I think I'd be tempted to just commit explicitly to LinkedList for pendingForms since we fully control the construction and use of it within the reader.
4) Might be nice to update the commented-out readers to support pendingForms as I did with opts. Or remove the updates for opts. Should either do all the mods or none on the commented-out code.
5) s/read-cond-splicing/read-cond-splice/ ? Seems like where it's used it should be a verb.
6) Should just use :default and make :else and :none throw exceptions. I think Rich mentioned :except or :exception too? or maybe I misheard that.
7) Should have some more tests to tweak the error cases - bad feature, uneven forms, default out of allowed position, bad contents for splice, etc.

Comment by Alex Miller [ 26/Jan/15 2:01 PM ]

From Chouser on the mailing list: "is it intentional that reading (clojure.core/read-cond ...) does not behave the same as (#? ...)? That is, (#? ...) can be read as c.c/read-cond depending on read options, but having been read, if it is printed again it doesn't round-trip back to #?. This is different, for example, from how #(...) is read as (fn* [] (...)), which then retains its meaning."

In shouldReadConditionally(), it looks like the == check vs READ_COND will not work. Instead of:
return (first == READ_COND || first == READ_COND_SPLICING);
do
return (READ_COND.equals(first) || READ_COND_SPLICING.equals(first));

For example, this test doesn't seem to give the right answer:

user=> (read-str-opts {:preserve-read-cond false} "(clojure.core/read-cond :clj :x :default :y)")
(clojure.core/read-cond :clj :x :default :y)    ;; should be :x
Comment by Michael Blume [ 26/Jan/15 3:27 PM ]

With this patch applied to master, lein check fails on instaparse:

Compiling namespace instaparse.abnf
Exception in thread "main" clojure.lang.ArityException: Wrong number of args (2) passed to: StringReader, compiling:(abnf.clj:186:28)
	at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3605)
	at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3599)
	at clojure.lang.Compiler$DefExpr.eval(Compiler.java:436)
	at clojure.lang.Compiler.eval(Compiler.java:6772)
	at clojure.lang.Compiler.load(Compiler.java:7194)
	at clojure.lang.RT.loadResourceScript(RT.java:384)
	at clojure.lang.RT.loadResourceScript(RT.java:375)
	at clojure.lang.RT.load(RT.java:459)
	at clojure.lang.RT.load(RT.java:425)
	at clojure.core$load$fn__5424.invoke(core.clj:5850)
	at clojure.core$load.doInvoke(core.clj:5849)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at user$eval52$fn__63.invoke(form-init5310597017138984927.clj:1)
	at user$eval52.invoke(form-init5310597017138984927.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:6767)
	at clojure.lang.Compiler.eval(Compiler.java:6757)
	at clojure.lang.Compiler.load(Compiler.java:7194)
	at clojure.lang.Compiler.loadFile(Compiler.java:7150)
	at clojure.main$load_script.invoke(main.clj:275)
	at clojure.main$init_opt.invoke(main.clj:280)
	at clojure.main$initialize.invoke(main.clj:308)
	at clojure.main$null_opt.invoke(main.clj:343)
	at clojure.main$main.doInvoke(main.clj:421)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.lang.Var.invoke(Var.java:383)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: clojure.lang.ArityException: Wrong number of args (2) passed to: StringReader
	at clojure.lang.AFn.throwArity(AFn.java:429)
	at clojure.lang.AFn.invoke(AFn.java:36)
	at instaparse.cfg$eval800$safe_read_string__801.invoke(cfg.clj:163)
	at instaparse.cfg$process_string.invoke(cfg.clj:180)
	at instaparse.cfg$build_rule.invoke(cfg.clj:217)
	at clojure.core$map$fn__4523.invoke(core.clj:2612)
	at clojure.lang.LazySeq.sval(LazySeq.java:40)
	at clojure.lang.LazySeq.seq(LazySeq.java:49)
	at clojure.lang.RT.seq(RT.java:504)
	at clojure.core$seq__4103.invoke(core.clj:135)
	at clojure.core$apply.invoke(core.clj:626)
	at instaparse.cfg$build_rule.invoke(cfg.clj:215)
	at clojure.core$map$fn__4523.invoke(core.clj:2612)
	at clojure.lang.LazySeq.sval(LazySeq.java:40)
	at clojure.lang.LazySeq.seq(LazySeq.java:49)
	at clojure.lang.RT.seq(RT.java:504)
	at clojure.core$seq__4103.invoke(core.clj:135)
	at clojure.core$apply.invoke(core.clj:626)
	at instaparse.cfg$build_rule.invoke(cfg.clj:211)
	at instaparse.cfg$build_rule.invoke(cfg.clj:214)
	at clojure.core$map$fn__4523.invoke(core.clj:2612)
	at clojure.lang.LazySeq.sval(LazySeq.java:40)
	at clojure.lang.LazySeq.seq(LazySeq.java:49)
	at clojure.lang.RT.seq(RT.java:504)
	at clojure.core$seq__4103.invoke(core.clj:135)
	at clojure.core$apply.invoke(core.clj:626)
	at instaparse.cfg$build_rule.invoke(cfg.clj:215)
	at clojure.core$map$fn__4523.invoke(core.clj:2612)
	at clojure.lang.LazySeq.sval(LazySeq.java:40)
	at clojure.lang.LazySeq.seq(LazySeq.java:49)
	at clojure.lang.RT.seq(RT.java:504)
	at clojure.core$seq__4103.invoke(core.clj:135)
	at clojure.core$apply.invoke(core.clj:626)
	at instaparse.cfg$build_rule.invoke(cfg.clj:211)
	at instaparse.cfg$build_rule.invoke(cfg.clj:207)
	at clojure.core$map$fn__4523.invoke(core.clj:2612)
	at clojure.lang.LazySeq.sval(LazySeq.java:40)
	at clojure.lang.LazySeq.seq(LazySeq.java:49)
	at clojure.lang.RT.seq(RT.java:504)
	at clojure.core$seq__4103.invoke(core.clj:135)
	at clojure.core.protocols$seq_reduce.invoke(protocols.clj:30)
	at clojure.core.protocols$fn__6436.invoke(protocols.clj:59)
	at clojure.core.protocols$fn__6389$G__6384__6402.invoke(protocols.clj:13)
	at clojure.core$reduce.invoke(core.clj:6501)
	at clojure.core$into.invoke(core.clj:6582)
	at instaparse.cfg$ebnf.invoke(cfg.clj:277)
	at clojure.lang.AFn.applyToHelper(AFn.java:154)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3600)
	... 27 more
Failed.
Comment by Michael Blume [ 26/Jan/15 3:29 PM ]

Aha, of course, Instaparse is calling into the LispReader$StringReader directly.

Is it worth providing versions of these methods with the old arities? Or should instaparse just not be using Clojure internals this way?

Comment by Michael Blume [ 26/Jan/15 3:33 PM ]

https://github.com/Engelberg/instaparse/blob/v1.3.5/src/instaparse/cfg.clj#L159

Comment by Alex Miller [ 26/Jan/15 3:33 PM ]

Instaparse is reaching pretty deep inside implementation details here, so I'd say this should expect to break. We could back-fill the old arities here but I'd really prefer not to if possible.

Comment by Luke VanderHart [ 27/Jan/15 11:23 AM ]

clj-1424-6.diff addresses all the issues mentioned above. Per a comment from Rich, it also adds tests to ensure that nested splices work properly (they do).

There were two things from your list I didn't do, Alex:

3) I kept pendingForms as a List. Because we aren't confining ourselves to a Deque interface, I don't see the benefit of calling pop() over remove(0) (with identical semantics) as justification for over-specifying the concrete type.

5) I kept "read-cond-splicing" since it parallels the form of "unquote-splicing". Seems that those should be consistent.

Comment by Luke VanderHart [ 30/Jan/15 9:03 PM ]

clj-1424-7.diff contains Rich's "reader-conditionals" proposal.

Comment by Alex Miller [ 06/Feb/15 3:45 PM ]

ReaderConditional / TaggedLiteral
1) when patch applied I see some whitespace errors in here, also line endings seem different, might want to check it
2) a common pattern in other Java classes is private constructor and public static create() method
3) could use Util.hash() to clean up the "null->0" logic in hashCode()

LispReader
4) adds unused import: java.util.Iterator
5) it looks like returnOn flag could just be collapsed into checking if returnOnChar is non-null?
6) in readCondDelimited, EOF and FINISHED are never used and can be removed presumably

Comment by Luke VanderHart [ 07/Feb/15 1:49 PM ]

I have attached clj-1424-8.diff, which addresses your most recent comments, Alex. I formatted it using `git format patch` instead of `git diff` so it should have the email address added correctly.

Your comments are all addressed, with the exception of returnOn. I don't think that can be collapsed. You really need two values: one to say what character should cause a return, and one to say what value should be returned in that scenario. You could use a convention on the return value, I suppose, (e.g, null means a completed read) but there's already precedent for passing in the value to be returned (namely, eofValue).

Comment by Alex Miller [ 09/Feb/15 9:49 AM ]

Looks good. I think I actually mis-read what returnOn was doing, so np on that. I still see the whitespace issues and the CR/LF in those two files. Were you going to change those?

Comment by Alex Miller [ 09/Feb/15 4:24 PM ]

Added new -9 patch that squashes the last patch but is otherwise identical. The older patches in that diff were the source of still seeing whitespace errors on apply.

Comment by Rich Hickey [ 20/Feb/15 8:09 AM ]

I think in the first iteration we should allow reader conditionals only in .cljc files, and support only standard features :clj, :cljs and :clr.

Comment by Andy Fingerhut [ 27/Feb/15 10:35 AM ]

Comment on -10 patch:

The doc string additions for clojure.core/read are a bit cryptic for those not already steeped in the details. It would be good to at least mention 'reader conditionals' so people have something to Google for. Current additions:

+  Opts is a persistent map with valid keys:
+    :read-cond - :allow, or :preserve to keep all branches as data
+    :features - persistent set of feature keywords
+    :eof - on eof, return value unless :eofthrow, then throw

Alternate suggestion:

+  Opts is a persistent map with valid keys:
+    :read-cond - :allow to process reader conditionals, or :preserve to keep all branches as data.  Throw if reader conditional found and this key not present.
+    :features - persistent set of feature keywords for processing reader conditionals.
+    :eof - on eof, return value unless :eofthrow, then throw
Comment by Alex Miller [ 27/Feb/15 10:39 AM ]

Thanks Andy, will consider. Indeed, this will have lengthier docs on http://clojure.org/reader so I was trying to just hit the surface but that's a reasonable suggestion.





[CLJ-1638] Regression - PersistentVector.create(List) was removed in 1.7.0-alpha5 Created: 12/Jan/15  Updated: 27/Feb/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Critical
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections, regression
Environment:

1.7.0-alpha5


Attachments: Text File clj-1638-2.patch     Text File clj-1638.patch    
Patch: Code
Approval: Screened

 Description   

For CLJ-1546, PersistentVector.create(List) was replaced with PersistentVector.create(ArrayList). At least one library (flatland) was calling this method directly and was broken by the change.

Approach: Change create(ArrayList) to more general prior method create(List).

Patch: clj-1638-2.patch

Screened by:



 Comments   
Comment by Rich Hickey [ 20/Feb/15 7:42 AM ]

Is there a good reason to have both PersistentVector.create(List)and PersistentVector.create(ArrayList)?

Comment by Fogus [ 27/Feb/15 9:08 AM ]

This couldn't possibly be more straight-forward.





[CLJ-1499] Replace seq-based iterators with direct iterators for all non-seq collections that use SeqIterator Created: 08/Aug/14  Updated: 27/Feb/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Rich Hickey Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File clj-1499-all.diff     Text File clj-1499-v10.patch     Text File clj-1499-v11.patch     Text File clj-1499-v12.patch     File clj-1499-v2.diff     File clj-1499-v3.diff     File clj-1499-v6.diff     File clj-1499-v7.diff     File clj-1499-v8.diff     Text File clj-1499-v9.patch     Text File defrecord-iterator.patch     File defrecord-iterator-v2.diff    
Patch: Code and Test
Approval: Screened

 Description   

Motivation:

Currently, the process of reducing over a map or set involves converting them to seqs. This action thus causes more intermediate objects than we'd like. Therefore, we'd like for reduce to be more efficient about the way that it reduces over these types. One way is for reduction to use the types' core collections while another way is to instead use a smart iterator type that abstracts the direct operation.

Solution

Add support for direct iterators instead of seq-based iterators for non-seq collections that use SeqIterator.

Patch adds support for direct iterators on the following (removing use of SeqIterator):

  • PersistentHashMap - new internal iterator (~20% faster)
  • APersistentSet - use internal map impl iterator (~5% faster)
  • PersistentQueue
  • PersistentStructMap
  • records (in core_deftype.clj)

Patch does not change use of SeqIterator in:

  • LazyTransformer$MultiStepper (not sure if this could be changed)
  • ASeq
  • LazySeq

In preparation for use in CLJ-1602, there is also a new IMapIterable interface that allows a map to publish direct Iterators of keys and values (without creating and pulling apart MapEntry objs). PersistentHashMap and PersistentArrayMap implement it right now and PersistentHashSet will check for and use the direct key iterator if possible.

Performance:

I re-ran the set of tests from CLJ-1546 for vec improvements which use these new iterators for PersistentArrayMap, PersistentHashMap, and PersistentHashSet. These test the time to run vec on a PHS or PHM of the given size. All timings in nanoseconds.

coll size 1.7.0-alpha4 1.7.0-alpha5 alpha5 + patch alpha5 + patch / alpha5
PHS 4 236 406 204 0.86
PHS 16 857 896 382 0.45
PHS 64 3637 3832 1723 0.47
PHS 256 14235 14672 6525 0.46
PHS 1024 67743 68857 44654 0.66
PHM 4 112 232 232 2.07
PHM 16 550 832 442 0.80
PHM 64 3014 3155 2014 0.67
PHM 256 11661 12082 7522 0.65
PHM 1024 57388 59787 44240 0.77

I also ran a set of tests just on reduce with reduce-kv for comparison (reduce-kv uses a different unaffected reduction path so it's a good baseline for comparison).

expr a-set / a-map size 1.6.0 1.7.0-alpha5 (ns) 1.7.0-alpha5 + patch (ns) patch / alpha5 patch / 1.6
(reduce + 0 a-set) 4 171 241 114 0.47 0.67
(reduce + 0 a-set) 16 663 826 395 0.48 0.60
(reduce + 0 a-set) 64 3637 4456 2361 0.53 0.65
(reduce + 0 a-set) 256 14772 16158 8872 0.55 0.60
(reduce + 0 a-set) 1024 72548 79010 49626 0.63 0.68
(reduce + 0 a-set) 16384 1197962 1234286 729335 0.59 0.61
(reduce #(+ %1 (val %2)) 0 a-map) 4 69 103 104 1.01 1.51
(reduce #(+ %1 (val %2)) 0 a-map) 16 640 752 506 0.67 0.79
(reduce #(+ %1 (val %2)) 0 a-map) 64 2861 3238 2388 0.74 0.83
(reduce #(+ %1 (val %2)) 0 a-map) 256 11998 12848 9526 0.74 0.79
(reduce #(+ %1 (val %2)) 0 a-map) 1024 59650 61913 54717 0.88 0.92
(reduce #(+ %1 (val %2)) 0 a-map) 16384 982317 969168 837842 0.86 0.85
(reducekv #(%1 %3) 0 a-map) 4 58 62 60 0.97 1.03
(reducekv #(%1 %3) 0 a-map) 16 233 250 248 0.99 1.06
(reducekv #(%1 %3) 0 a-map) 64 1477 1525 1442 0.95 0.98
(reducekv #(%1 %3) 0 a-map) 256 5470 5258 5201 0.99 0.95
(reducekv #(%1 %3) 0 a-map) 1024 27344 27442 27410 1.00 1.00
(reducekv #(%1 %3) 0 a-map) 16384 399591 397090 396267 1.00 0.99

Patch: clj-1499-v12.patch

Screened by:



 Comments   
Comment by Alex Miller [ 13/Aug/14 1:57 PM ]

The list of non-seqs that uses SeqIterator are:

  • records (in core_deftype.clj)
  • APersistentSet - fallback, maybe is ok?
  • PersistentHashMap
  • PersistentQueue
  • PersistentStructMap

Seqs (that do not need to be changed) are:

  • ASeq
  • LazySeq.java

LazyTransformer$MultiStepper - not sure

Comment by Ghadi Shayban [ 27/Sep/14 2:16 PM ]

attached iterator impl for defrecords. ready to leverage iteration for extmap when PHM iteration lands.

Comment by Alex Miller [ 29/Sep/14 12:52 PM ]

PHM patch

Comment by Alex Miller [ 29/Sep/14 10:26 PM ]

New patch that fixes bugs with PHMs with null keys (and added tests to expose those issues), added support for PHS.

Comment by Ghadi Shayban [ 29/Sep/14 10:45 PM ]

Alex, the defrecord patch already uses the iterator for extmap. It's just made better by the PHM patch.

Comment by Alex Miller [ 29/Sep/14 10:47 PM ]

Comment by Ghadi Shayban [ 30/Sep/14 4:17 PM ]

Heh. Skate to where the puck is going to be – Gretzky

Re: defrecord iterator: As is, it propagates exceptions from reaching the end of the ExtMap's iterator. As noted in CLJ-1453, PersistentArrayMap's iterator improperly returns an ArrayIndexOutOfBoundsException, rather than NoSuchElementException.

Comment by Alex Miller [ 30/Sep/14 6:41 PM ]

Hey Ghadi, rather than rebuilding the case map to pass to the RecordIterator, why don't you just pass the fields in iteration order to it and leverage the case map via .valAt like everything else?

Comment by Ghadi Shayban [ 30/Sep/14 7:30 PM ]

defrecord-iterator-v2.diff reuses valAt and minimizes macrology.

Comment by Alex Miller [ 07/Oct/14 2:04 PM ]

Comments from Stu (found under the couch):

"1. some of the impls (e.g. queue manually concatenate two iters. Would implementing iter-cat and calling that be simpler and more robust?
2. I found this tweak to the generative testing more useful in reporting failure, non-dependent on clojure.test, and capable of expecting failures. Waddya think?

(defn seq-iter-match
  [seqable iterable]
  (let [i (.iterator iterable)]
    (loop [s (seq seqable)
           n 0]
      (if (seq s)
        (do
          (when-not (.hasNext i)
            (throw (ex-info "Iterator exhausted before seq"
                            {:pos n :seqable seqable :iterable iterable})))
          (when-not (= (.next i) (first s))
            (throw (ex-info "Iterator and seq did not match"
                            {:pos n :seqable seqable :iterable iterable})))
          (recur (rest s) (inc n)))
        (when (.hasNext i)
          (throw (ex-info "Seq exhausted before iterator"
                          {:pos n :seqable seqable :iterable iterable})))))))

		(defspec seq-and-iter-match-for-maps
  identity
  [^{:tag clojure.test-clojure.data-structures/gen-map} m]
  (seq-iter-match m m))

3. similar generative approach would be good for the other types (looks like we just do maps)"

Comment by Alex Miller [ 02/Dec/14 3:03 PM ]

Latest patch (clj-1499-v6.diff) makes Stu's suggested change #2 above and adds tests recommended in #3. I looked at #1 but decided in the end that it wasn't going to make anything easier.

Comment by Alex Miller [ 09/Jan/15 11:11 AM ]

v7 patch just updates to current master

Comment by Alex Miller [ 14/Jan/15 4:08 PM ]

found a bug in last impl in maps with null values...fixing

Comment by Stuart Halloway [ 16/Jan/15 1:40 PM ]

v11 updates v10 to apply cleanly on master

Comment by Fogus [ 30/Jan/15 1:36 PM ]

Added a motivation section to the ticket.

Comment by Rich Hickey [ 20/Feb/15 7:54 AM ]

Any reason why MAKE_ENTRY/KEY/VAL are not final?

Comment by Alex Miller [ 20/Feb/15 8:00 AM ]

Nope, just an oversight. Will fix.

Comment by Alex Miller [ 20/Feb/15 9:46 AM ]

New -v12 patch is identical except makes MAKE_ENTRY, MAKE_KEY, MAKE_VAL final.

Comment by Fogus [ 27/Feb/15 8:45 AM ]

Screened latest v12 (with added finals) and would be happy to work in this code if the need arose. Additionally, with the added "motivation" the ticket and its solution makes more sense.





[CLJ-1515] Reify the result of range and add IReduceInit Created: 29/Aug/14  Updated: 24/Feb/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Timothy Baldridge Assignee: Alex Miller
Resolution: Unresolved Votes: 2
Labels: None

Attachments: Text File clj-1515-10.patch     Text File clj-1515-2.patch     Text File clj-1515-3.patch     Text File clj-1515-4.patch     Text File clj-1515-5.patch     Text File clj-1515-6.patch     Text File clj-1515-7.patch     Text File clj-1515-8.patch     Text File clj-1515-9.patch     Text File CLJ-1515-deftype2.patch     Text File CLJ-1515-deftype3.patch     Text File CLJ-1515-deftype-nostructural-dup.patch     Text File CLJ-1515-deftype.patch     Text File clj-1515.patch     File patch.diff     File range-patch3.diff     File reified-range4.diff    
Patch: Code and Test
Approval: Incomplete

 Description   

Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

Approach: this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types implement IReduce,IReduceInit,Counted,Seqable,IHashEq. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Rather than directly using < and > as comparators, it performed far better to close over the range upper bound, e.g. #(< % end). < and > are inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

Note The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq when the collection doesn't cache its seq.

Performance
timings done via criterium full benchmark on Oracle JDK

code 1.7.0-alpha5 Java impl clj-1515-10 CLJ-1515-deftype2
(count (range (* 1024 1024))) 61.30 ms 26.22 ns 12.67 ns
(reduce + (map inc (range (* 1024 1024)))) 56.21 ms 47.80 ms 46.87 ms
(reduce + (map inc (map inc (range (* 1024 1024))))) 77.98 ms 67.84 ms 66.91 ms
(count (keep odd? (range (* 1024 1024)))) 73.68 ms 57.33 ms 72.27 ms
(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) 50.56 ms 36.12 ms 28.46 ms
(reduce + 0 (range (* 2048 1024))) 81.86 ms 39.22 ms 26.44 ms
(doall (range 0 31)) 1.32 µs 2.89 µs 1.08 µs
(doall (range 0 32)) 1.34 µs 2.96 µs 1.12 µs
(doall (range 0 128)) 5.20 µs 11.92 µs 4.31 µs
(doall (range 0 512)) 21.24 µs 47.71 µs 17.91 µs
(doall (range 0 4096)) 167.86 µs 381.15 µs 141.48 µs
(into [] (map inc (range 31))) 2.00 µs 1.49 µs 1.87 µs
(into [] (map inc) (range 31)) 1.65 µs 813.78 ns 929.51 ns
(into [] (range 128)) 5.20 µs 2.27 µs 2.44 µs
(doall (range 1/2 1000 1/3)) 1.50 ms 1.69 ms 1.50 ms
(doall (range 0.5 1000 0.33)) 172.27 µs 364.35 µs 149.51 µs
(into [] (range 1/2 1000 1/3)) 1.52 ms 1.42 ms 1.43 ms
(into [] (range 0.5 1000 0.33)) 163.06 µs 99.44 µs 103.18 µs
(count (filter odd? (take (* 1024 1024) (range)))) 187.79 ms 194.07 ms 189.73 ms
(transduce (take (* 1024 1024)) + (range)) 53.06 ms 94.19 ms 89.34 ms

Performance notes:

doall cases see a substantial benefit.

The deftype has very competitive performance considering it is competing against the Java impl, which uses unchecked math incorrectly.

The final two cases of 0-arity (range) will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

Patch: CLJ-1515-deftype3.patch
Screened by: Alex Miller

Questions
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



 Comments   
Comment by Alex Miller [ 29/Aug/14 3:19 PM ]

1) Not sure about losing chunked seqs - that would make older usage slower, which seems undesirable.
2) RangeIterator.next() needs to throw NoSuchElementException when walking off the end
3) I think Range should implement IReduce instead of relying on support for CollReduce via Iterable.
4) Should let _hash and _hasheq auto-initialize to 0 not set to -1. As is, I think _hasheq always would be -1?
5) _hash and _hasheq should be transient.
6) count could be cached (like hash and hasheq). Not sure if it's worth doing that but seems like a win any time it's called more than once.
7) Why the change in test/clojure/test_clojure/serialization.clj ?
8) Can you squash into a single commit?

Comment by Timothy Baldridge [ 29/Aug/14 3:40 PM ]

1) I agree, adding chunked seqs to this will dramatically increase complexity, are we sure we want this?
2) exception added
3) I can add IReduce, but it'll pretty much just duplicate the code in protocols.clj. If we're sure we want that I'll add it too.
4) fixed hash init values, defaults to -1 like ASeq
5) hash fields are now transient
6) at the cost of about 4 bytes we can cache the cost of a multiplication and an addition, doesn't seem worth it?
7) the tests in serialization.clj assert that the type of the collection roundtrips. This is no longer the case for range which starts as Range and ends as a list. The change I made converts range into a list so that it properly roundtrips. My assumption is that we shouldn't rely on all implementations of ISeq to properly roundtrip through EDN.
8) squashed.

Comment by Alex Miller [ 29/Aug/14 3:49 PM ]

6) might be useful if you're walking through it with nth, which hits count everytime, but doubt that's common
7) yep, reasonable

Comment by Andy Fingerhut [ 18/Sep/14 6:52 AM ]

I have already pointed out to Edipo in personal email the guidelines on what labels to use for Clojure JIRA tickets here: http://dev.clojure.org/display/community/Creating+Tickets

Comment by Timothy Baldridge [ 19/Sep/14 10:02 AM ]

New patch with IReduce directly on Range instead of relying on iterators

Comment by Alex Miller [ 01/Oct/14 2:00 PM ]

The new patch looks good. Could you do a test to determine the perf difference from walking the old chunked seq vs the new version? If the perf diff is negligible, I think we can leave as is.

Another idea: would it make sense to have a specialized RangeLong for the (very common) case where start, end, and step could all be primitive longs? Seems like this could help noticeably.

Comment by Timothy Baldridge [ 03/Oct/14 10:00 AM ]

Looks like chunked seqs do make lazy seq code about 5x faster in these tests.

Comment by Ghadi Shayban [ 03/Oct/14 10:22 AM ]

I think penalizing existing code possibly 5x is a hard cost to stomach. Is there another approach where a protocolized range can live outside of core? CLJ-993 has a patch that makes it a reducible source in clojure.core.reducers, but it's coll-reduce not IReduce, and doesn't contain an Iterator. Otherwise we might have to take the chunked seq challenge.

Alex: Re long/float. Old reified Ranged.java in clojure.lang blindly assumes ints, it would be nice to have a long vs. float version, though I believe the contract of reduce boxes numbers. (Unboxed math can be implemented very nicely as in Prismatic's Hiphip array manipulation library, which takes the long vs float specialization to the extreme with different namespaces)

Comment by Timothy Baldridge [ 03/Oct/14 10:38 AM ]

I don't think anyone is suggesting we push unboxed math all the way down through transducers. Instead, this patch contains a lot of calls to Numbers.*, if we were to assume that the start end and step params of range are all Longs, then we could remove all of these calls and only box when returning an Object (in .first) or when calling IFn.invoke (inside .reduce)

Comment by Alex Miller [ 03/Oct/14 10:46 AM ]

I agree that 5x slowdown is too much - I don't think we can give up chunked seqs if that's the penalty.

On the long case, I was suggesting what Tim is talking about, in the case of all longs, create a Range that stores long prims and does prim math, but still return boxed objects as necessary. I think the only case worth optimizing is all longs - the permutation of other options gets out of hand quickly.

Comment by Ghadi Shayban [ 03/Oct/14 11:00 AM ]

Tim, I'm not suggesting unboxed math, but the singular fast-path of all-Longs that you and Alex describe. I mistakenly lower-cased Long/Float.

Comment by Timothy Baldridge [ 31/Oct/14 11:30 AM ]

Here's the latest work on this, a few tests fail. If someone wants to take a look at this patch feel free, otherwise I'll continue to work on it as I have time/energy.

Comment by Nicola Mometto [ 14/Nov/14 12:51 PM ]

As discussed with Tim in #clojure, the current patch should not change ArrayChunk's reduce impl, that's an error.

Comment by Alex Miller [ 09/Dec/14 2:40 AM ]

Still a work in progress...

Comment by Nicola Mometto [ 09/Dec/14 8:44 AM ]

Alex, while this is still a work in progress, I see that the change on ArrayChunk#reduce from previous WIP patches not only has not been reverted but has been extended. I don't think the current approach makes sense as ArrayChunk#reduce is not part of the IReduce/IReduceInit contract but of the IChunk contract and changing the behaviour to be IReduce-like in its handling of reduced introduces the burden of having to use preserve-reduced on the reducing function to no apparent benefit.

Given that the preserve-reduced is done on the clojure side, it seems to me like directly invoking .reduce rather than routing through internal-reduce should be broken but I haven't tested it.

Comment by Alex Miller [ 09/Dec/14 9:49 AM ]

That's the work in progress part - I haven't looked at yet. I have not extended or done any work re ArrayChunk, just carried through what was on the prior patch. I'll be working on it again tomorrow.

Comment by Ghadi Shayban [ 10/Dec/14 11:14 PM ]

I am impressed and have learned a ton through this exercise.

quick review of clj-1515-2
1) withMeta gives the newly formed object the wrong meta.
2) LongRange/create() is the new 0-arity constructor for range, which sets the 'end' to Double/POSITIVE_INFINITY cast as a long. Current core uses Double/POSITIVE_INFINITY directly. Not sure how many programs rely upon iterating that far, or how they would break.
3) Relatedly, depending on the previous point: Because only all-long arguments receive chunking, the very common case of (range) with no args would be unchunked. Doesn't seem like too much of a stretch to add chunking to the other impl.
4) Though the commented invariants say that Range is never empty, the implementation uses a magic value of _count == 0 to mean not cached, which is surprising to me. hashcodes have the magic value of -1
5) s/instanceof Reduced/RT.isReduced
6) is the overflow behavior of "int count()" correct?

Comment by Alex Miller [ 11/Dec/14 12:06 AM ]

1) agreed!
2) Good point. I am definitely changing behavior on this (max of 9223372036854775807). I will look at whether this can be handled without affecting perf. Really, handling an infinite end point is not compatible with several things in LongRange.
3) I actually did implement chunking for the general Range and found it was slower (the original Clojure chunking is faster). LongRange is making up for that difference with improved primitive numerics.
4) Since empty is invalid, 0 and -1 are equally invalid. But I agree -1 conveys the intent better.
5) agreed
6) probably not. ties into 2/3.

Thanks for this, will address.

Comment by Alex Miller [ 11/Dec/14 12:11 AM ]

Added -4 patch that addresses 1,4,5 but not the (range) stuff.

Comment by Alex Miller [ 11/Dec/14 12:51 PM ]

Latest -7 patch addresses all feedback and perf #s updated.

Comment by Ghadi Shayban [ 22/Dec/14 10:59 PM ]

See CLJ-1572, I believe CollReduce needs to be extended directly to clojure.lang.{LongRange,Range} inside protocols.clj

Comment by Ghadi Shayban [ 29/Dec/14 12:29 PM ]

Seems like a missing benchmark is (range 0 31) without the doall. Current patch (-9) allocates the chunk buffer at range's construction time. Maybe this can be delayed?

Comment by Alex Miller [ 02/Jan/15 11:24 AM ]

Ghadi - you are right. I reworked the patch (new -10 version) so that the chunk is only created on demand. Basically, the chunking is only used when traversing via chunked seqs and in normal seq iteration or reduce, no explicit chunks are created. That improved several timings. I also added the bench for (range 0 31) which was greatly improved by lazily creating the chunk, so good point on that.

Comment by Michael Blume [ 03/Jan/15 1:38 PM ]

I'm looking at the implementation of equiv() and wondering if it's worth checking whether the other object is also a reified range and comparing the private parameters rather than iterating through the sequences.

Comment by Ghadi Shayban [ 18/Jan/15 7:26 PM ]

Attaching a simplified implementation as a deftype. I benchmarked a billion variants and dumped bytecode. I'm attaching the best performing combination, and it beats out the Java one in most cases.

Approach:
Two deftypes, one specialized to primitive longs, the other generic math.
Implement: Seqable/Counted,IReduce,IReduceInit,Iterable
Also, marker interfaces Sequential and Serialized.

The implementations are very straightforward, but Seqable deserves mention.
'seq' delegates its implementation to the existing lazy-seq based range, which has been stripped of the strange corner cases and now only handles strictly ascending/descending ranges. It has been renamed range*, rather than range1 because all the arguments to range are already numbers. core references to range have been accordingly renamed.

The new range constructor is loaded after reduce & protocols load. It checks types, and delegates to either long-range or generic-range, which handle the wacky argument cases that are not strictly ascending or descending. I'm annoyed with the structural duplication of those conditionals, not sure how to solve it.

Inside the LongRange implementation of IReduce/IReduceInit, boxing is carefully controlled, and was verified through bytecode dumping.

I elaborated the benchmarks for comparison, and also included benchmarking without type specialization.

You discover strange things working on this stuff. Turns out having the comparator close over the 'end' field is beneficial: #(< % end).

Alex, I figured out why the Java versions had a nearly exactly 2x regression on (doall (range 0 31)), and I attached a change to 'dorun'. Instead of calling (seq coll) then (next coll), it effectively calls (next (seq coll)). I think the implementation assumes that calling 'seq' is evaluating the thunk inside LazySeq. This should also help other Seqable impls like Cycle/Iterate/Repeat CLJ-1603.

Results (criterium full runs):

code 1.7.0-alpha5 clj-1515-10.patch (Java) deftype specialized (attached) deftype unspecialized
(count (filter odd? (take (* 1024 1024) (range)))) 187.30 ms 194.92 ms 182.53 ms 184.13 ms
(transduce (take (* 1024 1024)) + (range)) 58.27 ms 89.37 ms 84.69 ms 84.72 ms
(count (range (* 1024 1024))) 62.34 ms 27.11 ns not run not run
(reduce + (map inc (range (* 1024 1024)))) 57.63 ms 46.14 ms 46.17 ms 52.94 ms
(reduce + (map inc (map inc (range (* 1024 1024))))) 82.83 ms 68.66 ms 64.36 ms 71.76 ms
(count (keep odd? (range (* 1024 1024)))) 76.09 ms 57.59 ms not run not run
(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) 52.17 ms 39.71 ms 28.83 ms 42.23 ms
(reduce + 0 (range (* 2048 1024))) 85.93 ms 38.03 ms 26.49 ms 42.43 ms
(doall (range 0 31)) 1.33 µs 2.89 µs 1.03 µs 1.08 µs
(doall (range 0 32)) 1.35 µs 2.97 µs 1.07 µs 1.10 µs
(doall (range 0 128)) 5.27 µs 11.93 µs 4.19 µs 4.29 µs
(doall (range 0 512)) 21.66 µs 47.33 µs 16.95 µs 17.66 µs
(doall (range 0 4096)) 171.30 µs 378.52 µs 135.45 µs 140.27 µs
(into [] (map inc (range 31))) 1.97 µs 1.57 µs 1.87 µs 1.95 µs
(into [] (map inc) (range 31)) 1.66 µs 824.67 ns 891.90 ns 1.11 µs
(into [] (range 128)) 5.11 µs 2.21 µs 2.43 µs 3.36 µs
(doall (range 1/2 1000 1/3)) 1.53 ms 1.67 ms 1.59 ms 1.52 ms
(doall (range 0.5 1000 0.33)) 164.83 µs 382.38 µs 149.38 µs 141.21 µs
(into [] (range 1/2 1000 1/3)) 1.53 ms 1.40 ms 1.43 ms 1.50 ms
(into [] (range 0.5 1000 0.33)) 157.44 µs 108.27 µs 104.18 µs 127.20 µs

Open Questions for screeners of the deftype patch:

1) What to do about the autopromotion at the Long/MAX_VALUE boundary? I've preserved the current behavior of Clojure 1.6.
2) Alex, I did not pull forward the filter chunk tweak you discovered
3) Is the structural duplication of the conditionals in generic-range long-range awful?
4) Are there any other missing interfaces? IMeta comes to mind. Not sure about IHashEq either.

Comment by Ghadi Shayban [ 18/Jan/15 7:30 PM ]

note with the deftype patch, the transduce over infinite (range) case is the only one where 'master' currently performs best. This is because the implementation was changed to (iterate inc' 0). When CLJ-1603 is applied that situation should improve better.

Comment by Ghadi Shayban [ 18/Jan/15 8:27 PM ]

fixup patch CLJ-1515-deftype

Comment by Ghadi Shayban [ 18/Jan/15 11:05 PM ]

new patch CLJ-1515-deftype-nostructural-dup.patch with the silly conditional duplication removed. Updated benchmarks including 'count' impls. These benchmarks run on a different machine with the same hardness. Blue ribbon.

code 1.7.0-alpha5 1.7.0-rangejava 1.7.0-rangespecial rangespecial / alpah5
(count (filter odd? (take (* 1024 1024) (range)))) 297.14 ms 333.93 ms 328.62 ms 1.10
(transduce (take (* 1024 1024)) + (range)) 105.55 ms 145.44 ms 164.70 ms 1.56
(count (range (* 1024 1024))) 108.92 ms 61.09 ns 26.61 ns 0
(reduce + (map inc (range (* 1024 1024)))) 97.67 ms 95.41 ms 84.62 ms 0.86
(reduce + (map inc (map inc (range (* 1024 1024))))) 140.21 ms 135.59 ms 116.38 ms 0.83
(count (keep odd? (range (* 1024 1024)))) 121.18 ms 104.63 ms 111.46 ms 0.92
(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) 100.40 ms 86.28 ms 67.17 ms 0.67
(reduce + 0 (range (* 2048 1024))) 131.77 ms 80.43 ms 63.24 ms 0.48
(doall (range 0 31)) 2.53 µs 4.36 µs 2.24 µs 0.89
(doall (range 0 32)) 2.37 µs 4.00 µs 1.99 µs 0.84
(doall (range 0 128)) 9.20 µs 14.98 µs 8.01 µs 0.87
(doall (range 0 512)) 37.28 µs 59.13 µs 35.16 µs 0.94
(doall (range 0 4096)) 331.28 µs 471.57 µs 291.76 µs 0.88
(into [] (map inc (range 31))) 2.83 µs 2.79 µs 2.67 µs 0.94
(into [] (map inc) (range 31)) 2.21 µs 1.39 µs 1.26 µs 0.57
(into [] (range 128)) 6.72 µs 3.25 µs 3.09 µs 0.46
(doall (range 1/2 1000 1/3)) 3.41 ms 4.04 ms 3.14 ms 0.92
(doall (range 0.5 1000 0.33)) 281.04 µs 530.92 µs 244.14 µs 0.87
(into [] (range 1/2 1000 1/3)) 3.32 ms 3.71 ms 2.99 ms 0.90
(into [] (range 0.5 1000 0.33)) 215.53 µs 165.93 µs 138.86 µs 0.64
Comment by Alex Miller [ 19/Jan/15 8:32 AM ]

This is looking very good and I think we should move forward with it as the preferred approach. Feel free to update the description appropriately. I'll file a separate ticket with the filter tweak. Some comments on the patch:

1) GenericRange/count - this math is broken if you start to mix infinity in there. I think just (count (seq this)) is safer.
2) GenericRange/iterable - I think if we are IReduceInit and Seqable, we can omit this. I had it in the Java one because I inherited Iterable via ASeq but that's not an issue in this impl.
3) LongRange/reduce - why Long/valueOf? Isn't start a long field?
4) LongRange/iterable - ditto #2
5) print-method - anytime I see print-method, there should probably be print-dup too. For example, this is broken: (binding [*print-dup* true] (println (range 0 10)))
6) Is serialization broken by this patch? Can you justify the test changes?

Comment by Ghadi Shayban [ 19/Jan/15 12:51 PM ]

regarding Serialization tests currently:

(defn roundtrip
  [v]
  (let [rt (-> v serialize deserialize)
        rt-seq (-> v seq serialize deserialize)]        ;; this
    (and (= v rt)            ;; this fails because the test ^ calls seq first
      (= (seq v) (seq rt))   ;; this passes
      (= (seq v) rt-seq))))  ;; this passes

These new types are merely seqable, so (not= (LongRange. 0 10 1) (seq (LongRange. 0 10 1)))

Not sure how to handle this 100%. Nothing precludes the LongRange itself from roundtripping, but just that calling seq on it returns a separate object.

Comment by Alex Miller [ 23/Jan/15 9:57 AM ]

More comments...

1) Instead of extending IReduce and IReduceInit, just extend IReduce and implement both arities (IReduce extends IReduceInit).
2) I'm slightly troubled by the .invokePrim now. Did you look at a macro that does prim type-hinting maybe?
3) On the serialization thing, is the problem really with serialization or with equality? The test that's failing is (= v rt). Because these are not IPersistentCollections, pcequiv won't be used and it's just calling LongRange.equals(), which is not implemented and falls back to identity, right? Probably need to implement the hashCode and equals stuff. Might need IHashEq too.

user=> (= (range 5) (range 5))
false
Comment by Ghadi Shayban [ 23/Jan/15 12:13 PM ]

Took care of 1) and 3). Punting on hiding the invokePrim behind a macro. It may be shameful but it works really fast.

Another case found:
(assoc {'(0 1 2 3 4) :foo} (range 5) :bar) needs to properly overwrite keys in the map.

Cause: Might be irrelevant in the face of CLJ-1375. Util/equiv for maps doesn't use IPC/equiv unless the collection is also java.util.Collection or Map. If it is then it checks for IPersistentCollection. I added a check for IPersistentCollection first.

https://github.com/ghadishayban/clojure/commits/for-screening

To get hasheq working, I added back the iterators for use by Murmur3/hashOrdered.

Comment by Ghadi Shayban [ 23/Jan/15 3:20 PM ]

Handle hash and equality not through IPersistentCollection. w/ test-cases too.

Comment by Alex Miller [ 20/Feb/15 11:39 AM ]

Ghadi, we need to have range implement IObj too so with-meta works.

Comment by Ghadi Shayban [ 20/Feb/15 12:09 PM ]

Will add pronto but maybe someone can clarify something: Adding a simple clojure.lang.IObj/withMeta to the deftype results in an AbstractMethodError when trying to print a range. This is because the impl of vary-meta used in the default printer assumes that if all IObjs are also IMetas. Seems like a problem with either vary-meta's impl or the interface separation.

I can fix by:
1) adding a _meta field to Ranges and handling IMeta as well as IObj.

;; vary-meta:
(with-meta obj (apply f (meta obj) args)))

;; default printer
(defmethod print-method :default [o, ^Writer w]
  (if (instance? clojure.lang.IObj o)
    (print-method (vary-meta o #(dissoc % :type)) w)
    (print-simple o w)))
Comment by Ghadi Shayban [ 20/Feb/15 12:25 PM ]

Ugh never mind that last bit I fell out of the hammock prematurely. IMeta << IObj.

Alex, CLJ-1603 needs IMeta/meta too.

Comment by Alex Miller [ 24/Feb/15 11:30 AM ]

current direction is pending results of where CLJ-1603 goes





[CLJ-1561] Incorrect line numbers are emitted Created: 10/Oct/14  Updated: 20/Feb/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Paul Stadig Assignee: Unassigned
Resolution: Unresolved Votes: 23
Labels: errormsgs

Attachments: Text File clj-1561.patch    
Patch: Code and Test
Approval: Ok

 Description   

The Clojure JVM compiler marks the line number for a form before emitting the children for that form. Marking the line number before emitting children leads to incorrect line numbers when a runtime error occurs. For example, when

 (foo bar
      baz)

is emitted the compiler will visit the line number for the expression, then emit the children expressions ('bar' and 'baz') which will mark their own line numbers, then come back and emit the invoke bytecode for 'foo', but since the last line number to be marked was that of 'baz', if 'foo' throws an exception the line number of 'baz' will be reported instead of the line number for the expression as a whole.

This same issue was being manifested with special forms and inlined functions, and was especially bad in the case of the threading macro '->', because it is usually spread across several lines, and the line number reported could end up being very different than the line actually causing an exception.

A demonstration of the incorrect line numbers (and how the fix affects line numbers) can be seen here https://github.com/pjstadig/clojure-line-numbers

Patch: clj-1561.patch

Screened by: Alex Miller



 Comments   
Comment by Jozef Wagner [ 10/Oct/14 1:57 PM ]

additions in your patch mixes tabs and spaces. Could you please update the patch so that your added lines indent only with tab characters? Not everyone has tab set at 4 spaces...

Comment by Paul Stadig [ 10/Oct/14 2:42 PM ]

There's already a mixture of just tabs, just spaces, and tabs & spaces in Compiler.java. I'm not sure what the "standard" is, but I've changed the patch to match the surrounding lines.

Comment by Paul Stadig [ 10/Oct/14 2:42 PM ]

Patch with whitespace changes.

Comment by Alex Miller [ 20/Oct/14 8:38 AM ]

These changes will affect the line number tables for a variety of Clojure constructs when compiled. It would be very helpful to me to have a set of examples that covered each case touched in the patch so that I could compile them and look at the bytecode vs the source. This would greatly accelerate the screening process.

Comment by Paul Stadig [ 20/Oct/14 2:29 PM ]

Alex,
I have created a repo on github that has a sample file demonstrating the line number changes.

https://github.com/pjstadig/clojure-line-numbers

Hope that helps!

BTW, I'd be glad to do a skype call or hangout, if you have questions.

Comment by Alex Miller [ 20/Oct/14 2:34 PM ]

This is very helpful, thanks!!

Comment by Alex Miller [ 22/Oct/14 11:35 AM ]

In the hunk at 3191 in KeywordInvokeExpr, a call to visitLineNumber was added, but the prior call 4 lines earlier was not removed. Should it be?

Comment by Paul Stadig [ 22/Oct/14 12:05 PM ]

I left that in thinking that if something goes wrong with the getstatic instruction (null pointer exception? class cast exception?) it should report the line number of the KeywordInvokeExpr. It may be that there isn't a realistic possibility that anything could actually happen with that getstatic instruction, but that was the thought process.

My general rule of thumb was if an emit method emits any instructions before it calls the emit method on another expr, then it should mark its line number before and after the recursive emit call (assuming that the recursive emit call would mark its own line number). In cases where an emit method immediately calls another emit method, then I don't bother to mark a line number until afterwards.

Comment by Bozhidar Batsov [ 10/Jan/15 3:09 AM ]

Seems we need to add tests before this can be merged - https://groups.google.com/forum/#!topic/clojure-dev/7pFhG8LMvGo

Comment by Daniel Solano Gómez [ 11/Jan/15 8:42 PM ]

Here's an updated version of Paul's patch that applies cleanly to master.

I based my test code from his GitHub repository, but I have made a few changes:

  1. I have added test cases for when emitUnboxed or the reflecting branch of emit is called.
  2. The original keyword invocation test doesn't actually go through KeywordInvokeExpr. It is resolved by the change to InvokeExpr.emitArgsAndCall, so I have removed it.

Additionally, I have been unable to create test cases that verify that the changes for intrinsic predicates, KeywordInvokExprs, and for the changes inside of InvokeExpr.emit. These could be that they are redundant, but I am not sure.

Comment by Paul Stadig [ 12/Jan/15 5:51 AM ]

Thanks Daniel!

I applied the patches for my commit (as updated by Daniel) and Daniel's commit. The namespace docstring had shifted all the line numbers down by 4, and the tests were failing. I amended Daniel's commit with updated line numbers and combined both commits into a single patch file (clj-1561.patch).

I also reverted my commit and re-ran Daniel's tests to verify that they failed as expected.

I believe the combined patch file should include everything we need other than the difficult test cases that Daniel mentions in his comment.

Cheers!

Comment by Alex Miller [ 12/Jan/15 8:07 AM ]

Whenever people feel this is ready for screening, just switch the Approval from Incomplete to Vetted.

Comment by Paul Stadig [ 12/Jan/15 2:20 PM ]

I have added a test for KeywordInvokeExpr, and a new test to test the logic in InvokeExpr.emit.

I removed the changes I made to StaticMethodExpr.emitIntrinsicPredicate, since I was unable to find a way to create a test case. The issue is that we mark the lines of child expressions, then come back to the parent expression and try to invoke without marking its line, but in the case of intrinsic predicates there is no invocation. When we come back to the parent we directly emit some bytecodes. I don think the change I made in that case actually changed anything, but I removed it since there is no need for it.

Comment by Alex Miller [ 14/Jan/15 10:07 PM ]

Could we squash the patch into a single commit?

Comment by Alex Miller [ 15/Jan/15 9:21 AM ]

Also, in addition to squashing the patch, can we rename examples_clj_1561.clj to line_number_examples.clj or something w/o the ticket in it? I'm ready to mark as screened when those are done. (And if that can get done today, it's likely to get pushed forward tomorrow.)

Comment by Daniel Solano Gómez [ 15/Jan/15 9:24 AM ]

In case Paul doesn't get to it sooner, I can take care of it this afternoon.

Comment by Paul Stadig [ 15/Jan/15 10:40 AM ]

Alex,
I'd like to maintain attribution. I could squash my test related commits into Daniel's commit, so we'd have two commits, my commit for the change and Daniel's for the tests. Would that suffice? If so, I can take care of it now.

Comment by Alex Miller [ 15/Jan/15 11:13 AM ]

Sure, that's fine.

Comment by Alex Miller [ 15/Jan/15 11:14 AM ]

note that one of the commits has a whitespace error (I think Daniel's) that got removed in a later commit. In the end patch, please make sure it doesn't warn about any whitespace errors when applied.

Comment by Alex Miller [ 15/Jan/15 12:46 PM ]

Paul, this patch still seems to have examples_clj_1561.clj in it - would prefer name that is meaningful vs jira issue.

Comment by Paul Stadig [ 15/Jan/15 1:44 PM ]

Yeah, sorry, I'm still working on it. Should be up shortly.

Comment by Paul Stadig [ 15/Jan/15 2:05 PM ]

I squashed the commits down to two: my commit for the change, and Daniel's for the tests. I renamed the "examples_clj_1561" file to "line_number_examples". I removed some trailing whitespace from Daniel's commit. There are still whitespace errors (on my machine) from my commit because tabs are used for indenting. I used tabs to indent because that is how the lines surrounding were formatted, and Jozef had already complained about the formatting. It also appears that these whitespace errors are dependent on your git config. I have:

[core]
whitespace = trailing-space,space-before-tab,tab-in-indent

But when I comment that out of my gitconfig I get no whitespace errors. I guess the bottom line is if we want to agree on a standard config for that I'd be glad to accomodate, but I don't think we want to reformat Compiler.java, so the best choice seems to be to just match the surrounding context, which is what I've done. If you want me to do anything further with that, then let me know.

Ball is back in your court, Alex.





[CLJ-1602] vals and keys return values should implement IReduceInit or Iterable Created: 25/Nov/14  Updated: 15/Jan/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Stuart Halloway Assignee: Alex Miller
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File clj-1602-2.diff     File clj-1602-3.diff     File clj-1602-4.diff     File clj-1602.diff    
Patch: Code and Test
Approval: Vetted

 Description   
  • with generative tests
  • with perf demos

Background: clojure.core/keys calls RT.keys(Object) calls APersistentMap.KeySeq.create(ISeq). RT.keys() creates a sequence (of Map.Entry objects) and KeySeq just wraps it, calling .keyValue(). There is an equivalent vals -> RT.vals() -> ValSeq path. Both of these seq impls extend ASeq and provide Iterable implementations via SeqIterator (iterator wrapped over the seq).

Approach: The important thing here is to avoid creating the sequence and instead directly iterate/reduce over the map. Noting that CLJ-1499 provides support for making PHM directly Iterable and that KeySeq/ValSeq already implement Iterable, I chose to focus on making the instances returned from keys and vals support Iterable directly on the underlying map instead of via the seq.

RT.keys()/vals() created the seq and passed it to KeySeq/ValSeq which made it too late to directly cover the original map iterator. There are a few places that rely on passing a seq of Map.Entry to keys/vals (not just a map instance), so I check for IPersistentMap and in that case pass it directly to a new KeySeq factory method that remembers both the Iterable and the ISeq. There is also support in here for using a direct key/value iterator via IMapIterable as introduced in CLJ-1499.

Questions/notes:

  • Could potentially check for Map or Iterable instead of IPersistentMap in RT.keys()/vals(). Not sure how common it is to pass normal Java maps to keys/vals.
  • The direct Iterable support vanishes once you move off the head of the keys or vals seq. So (rest (keys map)) does not have Iterable support. This is not really possible unless you hold an Iterator and advance it along with the seq, but that seemed to introduce all sorts of possibilities for badness. Since maps are unordered, it seems weird to rely on any ordering or processing only parts of any map, so I suspect doing this would be quite rare.
  • This patch depends on CLJ-1499 for IMapIterable.

Performance: I tested perf using criterium to benchmark as follows:

(use 'criterium.core)
(def m (zipmap (range 1000) (range 1000)))
(bench (reduce + (keys m)))
(bench (reduce + (vals m)))
(bench expr) 1.6.0 1.7.0-alpha5 alpha5 + clj-1499 + patch
(reduce + (keys m)) 69 µs 73 µs 44 µs
(reduce + (vals m)) 75 µs 77 µs 50 µs

Tests:

  • I added some basic tests for subseq and rsubseq as those both rely on the somewhat special behavior of keys accepting a seqable of Map.Entry objects (not just a map itself). There were no other tests for subseq or rsubseq already present.

Patch: clj-1602-4.diff - requires CLJ-1499 patch first (for IMapIterable)

Screened by:

  • the changes in CollReduce are unlikely to stand but are currently essential


 Comments   
Comment by Alex Miller [ 25/Nov/14 11:53 AM ]

Could leverage CLJ-1499 for the bulk of this, may pull that back from 1.8 into 1.7. Waiting on further work till that's answered.

Comment by Alex Miller [ 03/Dec/14 11:24 PM ]

I also have a patch that extends the CLJ-1499 iterators to support providing both key and val iterators that do not require creating and unpacking a Map.Entry. Unfortunately I only saw times that were ~48 µs on the perf benchmark in the description, so it's not a huge benefit (short-lived object allocation is cheap).

Comment by Alex Miller [ 09/Jan/15 9:15 AM ]

waiting on 1499 mods

Comment by Alex Miller [ 14/Jan/15 5:42 PM ]

Updated patch based on latest CLJ-1499-v10 patch.

Comment by Michael Blume [ 15/Jan/15 1:48 AM ]

I find it concerning that the v9 patch passed generative tests, shouldn't we have seen that?

Comment by Alex Miller [ 15/Jan/15 7:21 AM ]

Actually I did see it in the generative tests for this ticket so I moved those tests into the CLJ-1499 patch.

Comment by Michael Blume [ 15/Jan/15 10:10 AM ]

Ah, cool =)





[CLJ-1603] cycle, iterate, repeat return vals should IReduceInit Created: 25/Nov/14  Updated: 02/Mar/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Stuart Halloway Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File clj-1603-10.patch     Text File clj-1603-11.patch     Text File clj-1603-12.patch     Text File clj-1603-13.patch     Text File clj-1603-14.patch     Text File clj-1603-2.patch     Text File clj-1603-3.patch     Text File clj-1603-4.patch     Text File clj-1603-5.patch     Text File clj-1603-6.patch     Text File clj-1603-7.patch     Text File clj-1603-8.patch     Text File clj-1603-9.patch     Text File clj-1603.patch    
Patch: Code and Test
Approval: Vetted

 Description   

Alternatives:

There were a number of possible approaches for these enhancements:
1) Straight Java impl. The benefit of this approach is improving the performance of both the seq and reduce paths at the expense of writing a bunch of Java. See clj-1603-3.patch for the best of these impls.

2) Clojure deftype. This required moving cycle and iterate and providing a repeat1 implementation until deftype is defined (similar to the approach with reduce1). The deftype version returns a Seqable, IReduce object and has effectively the same former implementation for seq with a new fast implementation for IReduce. This makes reduce paths fast, but leaves seq paths about the same, with the benefit of no new Java code. The downside was that the new seqs did not implement the full range of interfaces from prior impls, which could potentially break code. See clj-1603-12.patch for a patch that covers this.

3) Add Iterable or IReduceInit directly to LazySeq. Conceptually, this does not make sense for general lazy seqs. Seqs materialize and cache each value once and doing this along with the ability to iterate/reduce introduces issues with caching (might as well use seqs for that) and synchronization. However, doing this for just these specific lazy seqs is possible - see latest patch.

Approach: Latest patch makes LazySeq implement IReduce and internal macro reducible-lazy-seq lets callers supply a function to implement the reduce path.

A few things to note:

  • Added some example-based tests for iterate, cycle, and repeat where I thought they were needed. Did not add generative tests - not clear to me what these would be that would actually be valuable. All of these functions are pretty simple and the examples cover the special cases.

Performance:

Some example timing, all in µs:

Expression 1.6.0 1.7.0-alpha5 alpha5 + clj-1603-3 (Java) alpha5 + clj-1603-12 (deftype) alpha5 + clj-1603-14 (split impl)
(doall (repeat 1000 1)) 87 94 8 92 91
(into [] (repeat 1000 1)) 99 110 12 12 14
(reduce + 0 (repeat 1000 1)) 99 126 17 19 22
(into [] (take 1000) (repeat 1)) n/a 67 35 29 27
(doall (take 1000 (cycle [1 2 3]))) 101 106 78 106 111
(into [] (take 1000) (cycle [1 2 3])) n/a 73 39 45 43
(doall (take 1000 (iterate inc 0))) 93 98 71 102 112
(into [] (take 1000) (iterate inc 0)) n/a 85 39 39 38
  • clj-1603-3 is a Java class implementation - generally it's faster for both seqs and reduce (at the cost of more Java)
  • clj-1603-12 is a deftype implementation - generally it's about the same on seqs but faster on reduce
  • clj-1603-14 is the LazySeq extension to IReduceInit - generally it's good but the iterate seq path introduces an extra lazy seq to force so it's slightly slower on that path.

Patch: clj-1603-14.patch

Screened by:



 Comments   
Comment by Ghadi Shayban [ 25/Nov/14 11:01 AM ]

Stu, do you intend these to be in Java or Clojure? It could be trickier to implement in Clojure directly, as loading would have to be deferred until core_deftype loads. It's certainly tractable without breaking any backwards compatibility, and I've explored this while experimenting with Range as a deftype https://github.com/ghadishayban/clojure/commit/906cd6ed4d7c624dc4e553373dabfd57550eeff2

A macro to help with Seq&List participation could be certainly useful, as efficiently being both a Seq/List and IReduceInit isn't a party.

May be useful to list requirements for protocol/iface participation.

It seems like 'repeatedly' is another missing link in the IReduceInit story.

Rich mentioned the future integration of reduce-kv at the conj, it would also be useful to know how that could fit in.

---- Other concerns and ops that may belong better on the mailing list ----

In experimenting with more reducible sources, I put out a tiny repo (github.com/ghadishayban/reducers) a couple weeks ago that includes some sources and operations. The sources were CollReduce and not ISeq.

Relatedly, caching the hashcode as a Java `transient` field is not supported when implementing a collection using deftype (patch w/ test in CLJ-1573).

Sources:
Iterate was one of them https://github.com/ghadishayban/reducers/blob/master/src/ghadi/reducers.clj#L43-L51
Repeatedly https://github.com/ghadishayban/reducers/blob/master/src/ghadi/reducers.clj#L43-L51

Reduce/transduce-based Operations that accept transducers:
some, any, yield-first https://github.com/ghadishayban/reducers/blob/master/src/ghadi/reducers.clj#L52-L80
(any could use a better name, equiv to (first (filter...)))
some and any have a symmetry like filter/remove.

Novelty maybe for 1.8:
A transducible context for Iterables similar to LazyTransformer:
https://github.com/ghadishayban/reducers/blob/master/src/ghadi/reducers.clj#L157-L161

The unless-reduced macro was very useful in implementing the collections:
https://github.com/ghadishayban/reducers/blob/master/src/ghadi/reducers.clj#L7-L15
It is different than the ensure-reduced and unreduced functions in core.

Comment by Alex Miller [ 25/Nov/14 12:01 PM ]

When we discussed this in the past, it was in the vein of reusing some of the range work (in Java) to implement cycle and iterate (per CLJ-1515).

Comment by Ghadi Shayban [ 25/Nov/14 9:20 PM ]

Never mind about 'repeatedly'. Being both ISeq and IReduceInit for repeatedly doesn't make sense for something that relies on side-effects. Current users of repeatedly can reduce over it many times and only realize the elements once.

Comment by Alex Miller [ 05/Dec/14 11:17 PM ]

attached wip Java impl and posted some example timings

Comment by Ghadi Shayban [ 11/Dec/14 4:35 PM ]

NB iterate in this patch does not cache the realized ISeq, but recalcs it at every call to realize the tail. This is not a change in the promised behavior (docstring says "f must be side-effect free") but an implementation change, as worth noting in the changelog.

Comment by Stuart Halloway [ 02/Jan/15 1:32 PM ]

It looks like all the reduce with no inital value paths are still seq-y, and slower, as shown by e.g.

(dotimes [_ 10]
  (time
   (reduce
    +
    (repeat 10000000 1))))

(dotimes [_ 20]
  (time
   (reduce
    +
    0
    (repeat 10000000 1))))
Comment by Alex Miller [ 02/Jan/15 2:01 PM ]

On that example in master before / after patch I see:

before:

  • no init = 844 ms
  • with init = 920 ms

after:

  • no init = 124 ms
  • with init = 90 ms

Is that similar to what you see or not?

Comment by Alex Miller [ 02/Jan/15 4:21 PM ]

The clj-1603-3.patch has been updated to use effectively the same algorithm for both versions of reduce. With the -3 patch, I got ~96 ms on both examples in the prior comment. I re-ran the tests in the description and updated those as well (about the same as expected).

Comment by Stuart Halloway [ 16/Jan/15 1:18 PM ]

The tests do not seem to hit the unseeded reduce branches – do we even want these branches? The original ticket was for IReduceInit.

Comment by Michael Blume [ 18/Jan/15 1:48 PM ]

Probably worth noting – Git will happily apply the latest patch for CLJ-1603 on top of the latest patch for CLJ-1515, but the result does not compile because 1515 uses iterate and 1603 moves the definition of iterate lower in clojure.core. Not sure if this is worth fixing now or just noting for when they're actually applied.

Comment by Michael Blume [ 18/Jan/15 1:52 PM ]

Actually, here, this just moves the declare statement further up the file.

Comment by Michael Blume [ 18/Jan/15 2:19 PM ]

OK, no, the two patches are still incompatible even with the declaration order fixed:

[java] ERROR in (test-range) (LongRange.java:95)
     [java] expected: (= (take 3 (range 3 9 0)) (quote (3 3 3)))
     [java]   actual: java.lang.ClassCastException: clojure.core.InfiniteRepeat cannot be cast to clojure.lang.ISeq
     [java]  at clojure.lang.LongRange.create (LongRange.java:95)
Comment by Alex Miller [ 18/Jan/15 2:31 PM ]

The 1515 patch is actually being reworked right now - we will patch things up at application time if needed.

Comment by Alex Miller [ 19/Jan/15 10:12 AM ]

Removing screened marking so can be re-screened. Added new -7 patch that handles print-method, print-dup, and unmapping the deftype constructors so they're not visible. Thanks to Ghadi in CLJ-1515 for the idea on those.

Comment by Ghadi Shayban [ 20/Jan/15 8:08 AM ]

Review of -7 patch:

Seqable/seq implementations that return a separate ISeq like these do should forward a call to seq on the result, like eduction does. [1] (This is not necessary in these particular impls, as the LazySeqs returned are themselves ISeqs. Also because Cycle's deftype is only constructed for non-empty cycles, the fact that there is a guaranteed seq is implicit. Probably a best practice to add an innocuous seq call if users look to these impls as a recipe.)

The performance regression in (doall (repeat 1000 1)) should go away completely with the dorun tweak in CLJ-1515. This is because dorun is effectively calling seq twice (it calls seq, throws away the result, then calls next.)

minor nits
1) repeat1 seems to be identical to repeat-seq and has both arities necessary for the deftypes
2) inside FiniteRepeat s/(> i 0)/(pos? i) also inside the repeat constructor
3) some things are defn- , some are ^:private
4) Cycle/reduce the recur binding can be (recur rr (or (next s) coll)) rather than nil? check

[1] https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L7324

Comment by Alex Miller [ 22/Jan/15 10:34 PM ]

Ghadi - good comments! Fixed 1,2,4. #3 - ^:private is because defn- is not yet defined. New -8 patch.

Comment by Alex Miller [ 23/Jan/15 10:03 AM ]

Bah.

user=> (= (repeat 5 :a) (repeat 5 :a))
false
Comment by Alex Miller [ 23/Jan/15 3:04 PM ]

Updated to -9 patch that handles hash and equality for finite repeat case.

Comment by Ghadi Shayban [ 26/Jan/15 2:24 PM ]

metadata in the wrong place on #'repeat1

Comment by Alex Miller [ 26/Jan/15 3:27 PM ]

Thanks, fixed in -10.

Comment by Stuart Halloway [ 20/Feb/15 10:19 AM ]

The collections returned by these APIs promise several things the new deftypes do not deliver. For example, 1.6's repeat currently has the following ancestors that are lost in the propopsed deftype:

  • clojure.lang.ISeq
  • java.io.Serializable
  • clojure.lang.IPending
  • java.util.Collection
  • java.util.List
  • java.lang.Iterable
  • clojure.lang.Obj
  • clojure.lang.IPersistentCollection
  • clojure.lang.IMeta
  • clojure.lang.IObj

Losing metadata and serializability are certainly regressions, other stuff maybe as well. I suspect similar problems in all the other API returns.

We could improve testing by taking advantage of the property-checking fns already in test-clojure/data-structures, e.g. is-same-collection

Comment by Alex Miller [ 20/Feb/15 10:42 AM ]

I think we should consider carefully what is contractually required by the return of repeat. My opinion is that repeat must return something seqable, not literally a seq (or lazy seq). With that perspective, ISeq, IPending (there re lazy seq laziness), Collection, List, Iterable, and IPersistentCollection are non-essential. Obj is a concrete class and we shouldn't commit to that.

  • IObj is a gap, this should work but doesn't: (with-meta (repeat 1 :a) {:a 1})
  • IMeta doesn't need to be added, will never have meta right now and meta handles this correctly
  • Serializable - doesn't make sense for the infinite seqs but should probably fix for finite range
Comment by Alex Miller [ 20/Feb/15 11:03 AM ]

Added -11 patch that adds IObj for all the new deftypes and Serializable for FiniteRepeat. Still need to add more tests.

Comment by Stuart Halloway [ 20/Feb/15 12:27 PM ]

I think all the java. interfaces are mandatory for interop. Leaving out any one of the clojure. interfaces creates observable change in behavior composing with other core fns (admittedly the IPending case would be bizarre, who uses that?), so those seem mandatory too.

Agreed Obj should be irrelevant, and if it isn't the bug is elsewhere.

Comment by Alex Miller [ 27/Feb/15 9:41 AM ]

pending further discussion w/ stu





Generated at Mon Mar 02 22:42:22 CST 2015 using JIRA 4.4#649-r158309.