ClojureScript

Eliminate :meta, :vector, :set, and :map ops

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Trivial Trivial
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Patch:
    Code

Description

The attached patch eliminates the :meta, :set, :vector, and :map ops.

These four operations can be defined more simply in terms of
calls to with-meta, set, vector, and hash-map respectively.

The compiler was optimizing construction of vectors and maps. Now,
those optimizations are implemented as macros. Additionally, sets
are optimized in much the same way.

3 files changed, 52 insertions, 99 deletions

Also worth mentioning: as macros instead of ops & emit methods, these optimizations can apply to any backend. The macros create ClojureScript forms, rather than manually generating JavaScript.

Activity

Brandon Bloom made changes -
Field Original Value New Value
Description The attached patch eliminates the :meta, :set, :vector, and :map ops.

These four operations can be defined more simply in terms of
calls to with-meta, set, vector, and hash-map respectively.

The compiler was optimizing construction of vectors and maps. Now,
those optimizations are implemented as macros. Additionally, sets
are optimized in much the same way.

3 files changed, 52 insertions(+), 99 deletions(-)
The attached patch eliminates the :meta, :set, :vector, and :map ops.

These four operations can be defined more simply in terms of
calls to with-meta, set, vector, and hash-map respectively.

The compiler was optimizing construction of vectors and maps. Now,
those optimizations are implemented as macros. Additionally, sets
are optimized in much the same way.

3 files changed, 52 insertions(+), 99 deletions(-)

Also worth mentioning: as macros instead of ops & emit methods, these optimizations can apply to any backend. The macros create ClojureScript forms, rather than manually generating JavaScript.
Hide
Brandon Bloom added a comment -

I'd also like to extend this for Symbols and Keywords.

I've been experimenting with fleshed out Symbol and Keyword objects with interning. I've found that I need emitters, macros, and functions. With the approach here, I could eliminate the emitters and instead have the analyzer produce invocation forms.

Show
Brandon Bloom added a comment - I'd also like to extend this for Symbols and Keywords. I've been experimenting with fleshed out Symbol and Keyword objects with interning. I've found that I need emitters, macros, and functions. With the approach here, I could eliminate the emitters and instead have the analyzer produce invocation forms.
Hide
Raphaël AMIARD added a comment -

I think this is an interesting patch. It would be worth adapting it to the decoupled emitters. It raises the question of how it would be possible to share part of the emitters between backends.

Show
Raphaël AMIARD added a comment - I think this is an interesting patch. It would be worth adapting it to the decoupled emitters. It raises the question of how it would be possible to share part of the emitters between backends.
Hide
David Nolen added a comment -

I don't see many benefits falling out of this patch. How to best emit language primitives like literals and constants may vary from host to host - perhaps emitting bytecode directly will work best for some implementations.

Show
David Nolen added a comment - I don't see many benefits falling out of this patch. How to best emit language primitives like literals and constants may vary from host to host - perhaps emitting bytecode directly will work best for some implementations.
Hide
Brandon Bloom added a comment -

Couldn't macros emit bytecode via a mechanism similar to the js* form?

My goals with this are:

1) Move some optimizations from the emit phase further down the pipeline. For example, consider choosing the best associative data structure to create. Why should {:foo "bar"} be optimized to an ObjMap or ArrayMap but (hash-map :foo "bar") not be? Why should that optimization be implemented in such a way that it can not be reused by alternative backends?

2) Operate at a higher level. Prefer working with Clojure forms over target-language code fragments (either strings or byte codes). This is where the code length savings is coming from.

If we continue with this approach, I see 4 or 5 more places where analyzer & emitter code can be replaced with shorter, simpler macros, which are more readily reused by alternate backends.

The one implication (downside?) this approach has on consumers of the analyzer or API is that they may need to do a little extra work when considering :invoke operations for static analysis and the like. However, that seems likely for most analyzers anyway, so this would be a matter of (defmethod handle-special-form :map) vs (defmethod handle-invoke :hash-map)

Show
Brandon Bloom added a comment - Couldn't macros emit bytecode via a mechanism similar to the js* form? My goals with this are: 1) Move some optimizations from the emit phase further down the pipeline. For example, consider choosing the best associative data structure to create. Why should {:foo "bar"} be optimized to an ObjMap or ArrayMap but (hash-map :foo "bar") not be? Why should that optimization be implemented in such a way that it can not be reused by alternative backends? 2) Operate at a higher level. Prefer working with Clojure forms over target-language code fragments (either strings or byte codes). This is where the code length savings is coming from. If we continue with this approach, I see 4 or 5 more places where analyzer & emitter code can be replaced with shorter, simpler macros, which are more readily reused by alternate backends. The one implication (downside?) this approach has on consumers of the analyzer or API is that they may need to do a little extra work when considering :invoke operations for static analysis and the like. However, that seems likely for most analyzers anyway, so this would be a matter of (defmethod handle-special-form :map) vs (defmethod handle-invoke :hash-map)
Hide
Michał Marczyk added a comment - - edited

Re: 1, I don't think we should be "optimizing" hash-map or array-map (or similar) calls. These functions are a documented way of requesting a map of a particular type (see the docstrings) which I think should not be removed. If anything, we might want to introduce an obj-map function to create arbitrarily large ObjMaps on request (in fact I'll look into that, but that is a separate discussion).

Additionally, the fact that {} is optimized to be a ObjMap in CLJS goes to show that any map-emitting macro will need to be rewritten for each target platform (ObjMap only makes sense when targeting JS, so this optimization simply won't be applicable to other backends). If so and assuming hash-map & Co. retain the behaviour advertised in their docstrings, there's not much gain to implementing this in a macro over just writings a bunch of emitters.

As for decoupling emitters – I think it's perfectly fine for them not to be decoupled, they are the layer closest to the platform after all. Certainly if there's some code which turns out to look the same across multiple platforms it might be worth it to move it upwards in the stack (not necessarily, though – moving it sideways, to a utility namespace / library, might turn out to be more appropriate), but I have a feeling this is an issue best decided once there actually are multiple backends in place and the various costs and benefits can be judged properly.

Now, the story might well be different if we were to introduce some generic factory functions – "create a map of some type", "create a set of some type" etc. – if (and only if!) they would be meant for public consumption. Then implementing a bunch of compiler macros around those new factories and letting them handle data structure literals would save some duplicate work. I don't want to pronounce an opinion on the usefulness of such generic factory functions at this time – just pointing out the possibility.

Show
Michał Marczyk added a comment - - edited Re: 1, I don't think we should be "optimizing" hash-map or array-map (or similar) calls. These functions are a documented way of requesting a map of a particular type (see the docstrings) which I think should not be removed. If anything, we might want to introduce an obj-map function to create arbitrarily large ObjMaps on request (in fact I'll look into that, but that is a separate discussion). Additionally, the fact that {} is optimized to be a ObjMap in CLJS goes to show that any map-emitting macro will need to be rewritten for each target platform (ObjMap only makes sense when targeting JS, so this optimization simply won't be applicable to other backends). If so and assuming hash-map & Co. retain the behaviour advertised in their docstrings, there's not much gain to implementing this in a macro over just writings a bunch of emitters. As for decoupling emitters – I think it's perfectly fine for them not to be decoupled, they are the layer closest to the platform after all. Certainly if there's some code which turns out to look the same across multiple platforms it might be worth it to move it upwards in the stack (not necessarily, though – moving it sideways, to a utility namespace / library, might turn out to be more appropriate), but I have a feeling this is an issue best decided once there actually are multiple backends in place and the various costs and benefits can be judged properly. Now, the story might well be different if we were to introduce some generic factory functions – "create a map of some type", "create a set of some type" etc. – if (and only if!) they would be meant for public consumption. Then implementing a bunch of compiler macros around those new factories and letting them handle data structure literals would save some duplicate work. I don't want to pronounce an opinion on the usefulness of such generic factory functions at this time – just pointing out the possibility.
Hide
Brandon Bloom added a comment -

> These functions are a documented way of requesting a map of a particular type

D'oh! You're right.

> we might want to introduce an obj-map

I see you did just that with CLJS-322 – nice.

> the story might well be different if we were to introduce some generic factory functions

There are already some generic factory functions. 'set, for example, is documented as "Returns a set of the distinct elements of coll." despite always returning a PersistentHashSet. Similar for vector and some others. It seems like map is the only core data structure that realistically has several reasonable choices for a default representation.

> implementing a bunch of compiler macros around those new factories and letting them handle data structure literals would save some duplicate work

So all this was somewhat inspired by tagged_literals.clj – You'll see that those functions are effectively macros which take a form and, generally, return an invocation form.

In my mind, I see Clojure's sugar syntax as a strict expansion transformation.

For example, ^:m {:x [@y 'z/w true]} is simply a shortcut for:

(with-meta (make-map (keyword "x") (vector (deref y) (symbol "z" "w") Boolean/TRUE)) (make-map (keyword "m") Boolean/TRUE))

This sort of thing already happens for @ derefs, # lambdas, etc.

In theory, this could be implemented at a level lower than the compiler. You could, for instance, define a reader "desugar" mode which only returns lists and primitives instead of vectors, maps, etc. This would greatly reduce the number of special forms in the compiler, since all of these boil down to invocations with macros.

Emit methods could be replaced with macros for at least these things: vars, maps, vectors, sets, nil, bools, regexes, keywords, symbols, metadata, and empty lists.

The result would be a significant reduction in the amount of code in the compiler for a proportionally smaller increase in the amount of code in per-language macros and maybe the reader.

> I have a feeling this is an issue best decided once there actually are multiple backends in place

I'll grant you that.

I've said my piece on the topic and don't feel very strongly about this particular patch. I just wanted to spark the discussion about reusing more bits of the compiler between backends. In my mind, it's almost always preferable to transform lists than it is to emit strings. I tried that, and the result was a reduction in responsibilities for the analyzer and macros that were easier to work with than emit methods.

Show
Brandon Bloom added a comment - > These functions are a documented way of requesting a map of a particular type D'oh! You're right. > we might want to introduce an obj-map I see you did just that with CLJS-322 – nice. > the story might well be different if we were to introduce some generic factory functions There are already some generic factory functions. 'set, for example, is documented as "Returns a set of the distinct elements of coll." despite always returning a PersistentHashSet. Similar for vector and some others. It seems like map is the only core data structure that realistically has several reasonable choices for a default representation. > implementing a bunch of compiler macros around those new factories and letting them handle data structure literals would save some duplicate work So all this was somewhat inspired by tagged_literals.clj – You'll see that those functions are effectively macros which take a form and, generally, return an invocation form. In my mind, I see Clojure's sugar syntax as a strict expansion transformation. For example, ^:m {:x [@y 'z/w true]} is simply a shortcut for: (with-meta (make-map (keyword "x") (vector (deref y) (symbol "z" "w") Boolean/TRUE)) (make-map (keyword "m") Boolean/TRUE)) This sort of thing already happens for @ derefs, # lambdas, etc. In theory, this could be implemented at a level lower than the compiler. You could, for instance, define a reader "desugar" mode which only returns lists and primitives instead of vectors, maps, etc. This would greatly reduce the number of special forms in the compiler, since all of these boil down to invocations with macros. Emit methods could be replaced with macros for at least these things: vars, maps, vectors, sets, nil, bools, regexes, keywords, symbols, metadata, and empty lists. The result would be a significant reduction in the amount of code in the compiler for a proportionally smaller increase in the amount of code in per-language macros and maybe the reader. > I have a feeling this is an issue best decided once there actually are multiple backends in place I'll grant you that. I've said my piece on the topic and don't feel very strongly about this particular patch. I just wanted to spark the discussion about reusing more bits of the compiler between backends. In my mind, it's almost always preferable to transform lists than it is to emit strings. I tried that, and the result was a reduction in responsibilities for the analyzer and macros that were easier to work with than emit methods.
Hide
Michał Marczyk added a comment -
Show
Michał Marczyk added a comment - Some further discussion here: http://clojure-log.n01se.net/date/2012-06-24.html#20:30a
Hide
Brandon Bloom added a comment -

One other advantage of function application over special casing maps/sets/etc is that argument evaluation order is well defined for function application (left-to-right). The Clojure reader returns un-ordered maps & sets, so without changing the reader, we have no way of being able to know what order map key-value-pairs or set elements were originally in. I filed a bug on that. I think we need to make the reader extensible to say to create the return values from their children expressions. In the case of the ClojureScript compiler, we do care about order, so we'd want to return either a (make-map ...) form directly, or a sorted-map by read-order. Same goes for sets.

Show
Brandon Bloom added a comment - One other advantage of function application over special casing maps/sets/etc is that argument evaluation order is well defined for function application (left-to-right). The Clojure reader returns un-ordered maps & sets, so without changing the reader, we have no way of being able to know what order map key-value-pairs or set elements were originally in. I filed a bug on that. I think we need to make the reader extensible to say to create the return values from their children expressions. In the case of the ClojureScript compiler, we do care about order, so we'd want to return either a (make-map ...) form directly, or a sorted-map by read-order. Same goes for sets.
Hide
David Nolen added a comment -

There's not enough rationale for this one.

Show
David Nolen added a comment - There's not enough rationale for this one.
David Nolen made changes -
Priority Minor [ 4 ] Trivial [ 5 ]
Hide
Brandon Bloom added a comment -
Show
Brandon Bloom added a comment - David and I resolved this in a different (better) way: https://github.com/clojure/clojurescript/commit/f80956d90f455810be140cfec1632f55254385a5
Brandon Bloom made changes -
Resolution Completed [ 1 ]
Status Open [ 1 ] Resolved [ 5 ]
David Nolen made changes -
Status Resolved [ 5 ] Closed [ 6 ]

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: