<< Back to previous view

[CLJ-1517] unrolled small collections Created: 01/Sep/14  Updated: 02/Sep/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Zach Tellman Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: collections, performance

Attachments: File unrolled-collections.diff    
Patch: Code
Approval: Triaged

 Description   

As discussed on the mailing list [1], this patch has two unrolled variants of vectors and maps, with special inner classes for each cardinality. Currently both grow to six elements before spilling over into the general versions of the data structures, which is based on rough testing but can be easily changed. At Rich's request, I haven't included any integration into the rest of the code, and there are top-level static create() methods for each.

The sole reason for this patch is performance, both in terms of creating data structures and performing operations on them. This can be seen as a more verbose version of the trick currently played with PersistentArrayMap spilling over into PersistentHashMap. Based on the benchmarks, which can be run by cloning cambrian-collections [2] and running 'lein test :benchmark', this should supplant PersistentArrayMap. Performance is at least on par with PAM, and often much faster. Especially noteworthy is the creation time, which is 5x faster for maps of all sizes (lein test :only cambrian-collections.map-test/benchmark-construction), and on par for 3-vectors, but 20x faster for 5-vectors. There are similar benefits for hash and equality calculations, as well as calls to reduce().

This is a big patch (over 5k lines), and will be kind of a pain to review. My assumption of correctness is based on the use of collection-check, and the fact that the underlying approach is very simple. I'm happy to provide a high-level description of the approach taken, though, if that will help the review process.

I'm hoping to get this into 1.7, so please let me know if there's anything I can do to help accomplish that.

[1] https://groups.google.com/forum/#!topic/clojure-dev/pDhYoELjrcs
[2] https://github.com/ztellman/cambrian-collections



 Comments   
Comment by Zach Tellman [ 01/Sep/14 10:13 PM ]

Oh, I forgot to mention that I didn't make a PersistentUnrolledSet, since the existing wrappers can use the unrolled map implementation. However, it would be moderately faster and more memory efficient to have one, so let me know if it seems worthwhile.

Comment by Nicola Mometto [ 02/Sep/14 5:23 AM ]

Zach, the patch you added isn't in the correct format, they need to be created using `git format-patch`

Comment by Nicola Mometto [ 02/Sep/14 5:31 AM ]

Also, I'm not sure if this is on-scope with the ticket but those patches break with *print-dup*, as it expects a static create(x) method for each inner class.

I'd suggest adding a create(Map x) static method for the inner PersistentUnrolledMap classes and a create(ISeq x) one for the inner PersistentUnrolledVector classes

Comment by Alex Miller [ 02/Sep/14 8:14 AM ]

Re making patches, see: http://dev.clojure.org/display/community/Developing+Patches

Comment by Jozef Wagner [ 02/Sep/14 9:16 AM ]

I wonder what is the overhead of having meta and 2 hash fields in the class. Have you considered a version where the hash is computed on the fly and where you have two sets of collections, one with meta field and one without, using former when the actual metadata is attached to the collection?

Comment by Zach Tellman [ 02/Sep/14 12:13 PM ]

I've attached a patch using the proper method. Somehow I missed the detailed explanation for how to do this, sorry. I know the guidelines say not to delete previous patches, but since the first one isn't useful I've deleted it to minimize confusion.

I did the print-dup friendly create methods, and then realized that once these are properly integrated, 'pr' will just emit these as vectors. I'm fairly sure the create methods aren't necessary, so I've commented them out, but I'm happy to add them back in if they're useful for some reason I can't see.

I haven't given a lot of thought to memory efficiency, but I think caching the hashes are worthwhile. I can see an argument for creating a "with-meta" version of each collection, but since that would double the size of an already enormous patch, I think that should probably wait.





[CLJ-1516] Throw an exception if def name contains a dot Created: 29/Aug/14  Updated: 29/Aug/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: compiler

Attachments: Text File 0001-throw-an-exception-on-def-names-containing-dots.patch    
Patch: Code
Approval: Triaged

 Description   

In this comment: http://dev.clojure.org/jira/browse/CLJ-1100?focusedCommentId=35510&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35510 Rich said that Vars whose name contains a dot are not supported, but the current implementation allows their definition.
This patch makes `(def foo.bar)` throw a compile-time exception



 Comments   
Comment by Alex Miller [ 29/Aug/14 10:41 AM ]

I'm curious whether this breaks existing code in the wild.

Comment by Nicola Mometto [ 29/Aug/14 10:45 AM ]

I find this hard to believe given the current behaviour:

user=> (def a.b 1)
#'user/a.b
user=> a.b
CompilerException java.lang.ClassNotFoundException: a.b, compiling:(NO_SOURCE_PATH:0:0)

one would need to go out of his way and refer to the var namespace qualified everywhere to make it work

Comment by Nicola Mometto [ 29/Aug/14 11:03 AM ]

After a brief conversation on #clojure, I updated the patch to only throw on non-macro defs so that macros like clojure.core/.. and clojure.core.incubator/.?. will work fine





[CLJ-1515] Reify the result of range Created: 29/Aug/14  Updated: 29/Aug/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Enhancement Priority: Minor
Reporter: Timothy Baldridge Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File patch.diff    
Patch: Code and Test
Approval: Incomplete

 Description   

Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

Approach: this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range.

Note: this code keeps backwards compatibility with the existing range code. This means some parts of the class (mostly relating to a step size of 0) are a bit more complex than desired, but these bits were needed to get all the tests to pass.

Note: this code does not preserve the chunked-seq nature of the original range. The fact that range used to return chunked seqs was not published in the doc strings and so it was removed to allow for simpler code in Range.java.

Patch: patch.diff



 Comments   
Comment by Alex Miller [ 29/Aug/14 3:19 PM ]

1) Not sure about losing chunked seqs - that would make older usage slower, which seems undesirable.
2) RangeIterator.next() needs to throw NoSuchElementException when walking off the end
3) I think Range should implement IReduce instead of relying on support for CollReduce via Iterable.
4) Should let _hash and _hasheq auto-initialize to 0 not set to -1. As is, I think _hasheq always would be -1?
5) _hash and _hasheq should be transient.
6) count could be cached (like hash and hasheq). Not sure if it's worth doing that but seems like a win any time it's called more than once.
7) Why the change in test/clojure/test_clojure/serialization.clj ?
8) Can you squash into a single commit?

Comment by Timothy Baldridge [ 29/Aug/14 3:40 PM ]

1) I agree, adding chunked seqs to this will dramatically increase complexity, are we sure we want this?
2) exception added
3) I can add IReduce, but it'll pretty much just duplicate the code in protocols.clj. If we're sure we want that I'll add it too.
4) fixed hash init values, defaults to -1 like ASeq
5) hash fields are now transient
6) at the cost of about 4 bytes we can cache the cost of a multiplication and an addition, doesn't seem worth it?
7) the tests in serialization.clj assert that the type of the collection roundtrips. This is no longer the case for range which starts as Range and ends as a list. The change I made converts range into a list so that it properly roundtrips. My assumption is that we shouldn't rely on all implementations of ISeq to properly roundtrip through EDN.
8) squashed.

Comment by Alex Miller [ 29/Aug/14 3:49 PM ]

6) might be useful if you're walking through it with nth, which hits count everytime, but doubt that's common
7) yep, reasonable





[CLJ-1514] Use qualified class names for return type hints of standard Clojure functions Created: 28/Aug/14  Updated: 28/Aug/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Tassilo Horn Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: enhancement, interop, patch, typehints

Attachments: Text File 0001-Use-fully-qualified-class-names-for-return-type-hint.patch    
Patch: Code

 Description   

The attached patch converts all function return type hints to spell out the class name fully qualified. There are two reasons for doing this:

1. Simple names in return type hints cause the issue described in http://dev.clojure.org/jira/browse/CLJ-1232. That's usually not a problem with return type hints referring to java.lang-classes because those are always imported. However, using `ns-unmap` you can remove them. For example, after `(ns-unmap ns 'String)` in my namespace, `(.length (format "foo = %s") 1)` throws an IllegalArgumentException: Unable to resolve classname: String. By using fully-qualified class names, that problem goes away.

2. tools.analyzer (used by the Clojure lint tool Eastwood) crashes when encountering such a simple-named return type hint. So currently, I cannot lint parts of my project because there's code that calls `clojure.core/format`.



 Comments   
Comment by Alex Miller [ 28/Aug/14 9:34 AM ]

1. that seems like a pretty weird thing to do
2. sounds like an issue with tools.analyzer, not with Clojure?

Comment by Nicola Mometto [ 28/Aug/14 10:46 AM ]

Just to clarify, tools.analyzer(.jvm) can analyze just fine forms in the form (defn x ^Class []) as long as Class is resolvable, whereas it will throw an exception if that function is then used in a namespace where that class is no longer resolvable, which is similar to what Clojure already does, except tools.analyzer.jvm will throw an exception even if the type hint is not used.

Since version 0.5.1 there's an handler that can be provided to change that behaviour, see https://github.com/clojure/tools.analyzer.jvm/blob/master/src/main/clojure/clojure/tools/analyzer/passes/jvm/validate.clj#L232

Comment by Nicola Mometto [ 28/Aug/14 11:02 AM ]

Now a comment regarding this ticket: the patch in this ticket is just a work-around for the issue exposed in http://dev.clojure.org/jira/browse/CLJ-1232, IMHO the correct move would be to actually recognize that issue as a bug rather than as an accepted "limitation" as Rich's comment seems to suggest so that a fix might be commited.

Comment by Tassilo Horn [ 28/Aug/14 1:29 PM ]

@Alex: 1. is not as weird as it sounds at first. For example, consider you have macros that generate complete APIs for something into some new namespace. Then it can make sense to use a real vanilla namespace, i.e., without referring clojure.core and importing java.lang. With 2. I side with Nicola and consider CLJ-1232 a bug.

@Nicola: Today I've used Eastwood (0.1.4) to lint my project. It crashed when it encountered this definition:

(defmacro error
  "Throws an exception with the given message and cause."
  ([msg]
     `(error ~msg nil))
  ([msg cause]
     `(throw (java.lang.Exception. ~msg ~cause))))

(defmacro errorf
  "Throws an exception with the given `msg` and `objs` passed to `format`.
  `msg` is a format string."
  [msg & objs]
  `(error (format ~msg ~@objs)))  ;; This is line 112 where the crash occurs

The message was:

Exception thrown during phase :analyze+eval of linting namespace funnyqt.tg-test
A function, macro, protocol method, var, etc. named clojure.core/format has been used here:
{:file "funnyqt/utils.clj",
 :end-column 19,
 :column 12,
 :line 112,
 :end-line 112}
Wherever it is defined, or where it is called, it has a type of String
This appears to be a Java class name with no package path.
Library tools.analyzer, on which Eastwood relies, cannot analyze such files.
If this definition is easy for you to change, we recommend you prepend it with
a full package path name, e.g. java.net.URI
Otherwise import the class by adding a line like this to your ns statement:
    (:import (java.net URI))

An exception was thrown while analyzing namespace funnyqt.tg-test 
Lint results may be incomplete.  If there are compilation errors in
your code, try fixing those.  If not, check above for info on the
exception.

So it seems it crashes because `format` has a `^String` return type hint. The namespace containing the `errorf` macro above has no modified ns-imports, i.e., all java.lang classes are imported there, too.

Comment by Nicola Mometto [ 28/Aug/14 1:46 PM ]

Tassilo, since `errorf` is a macro, that error is probably caused at the expansion point of that macro in a namespace that unmaps 'String.
If that's not the case, please open a ticket in the eastwood repo

Comment by Tassilo Horn [ 28/Aug/14 2:16 PM ]

Nicola, you are correct. As I've explained above to Alex, I generate APIs in fresh namespaces that don't refer clojure.core and also ns-unmap all java.lang classes, and the generated code also contains `errorf`-forms.

Well, since `ns-unmap` is there, I think it's legit to use it. So that makes CLJ-1232 even more important. But until that gets fixed which requires a common agreement that it is indeed a bug, I'd be very happy if this patch could be accepted. I mean, when it cannot do any harm and doesn't obscure anything but helps at least one person, then why not do it?





Generated at Tue Sep 02 18:54:59 CDT 2014 using JIRA 4.4#649-r158309.