ClojureScript

improve pr-str performance

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None

Description

Currently pr-str performance is terrible. We should implement printing in the same style as core_print.clj Clojure JVM.

Activity

Hide
Evan Mezeske added a comment -

I've rewritten the pr-str stuff, passing a StringBuffer down and appending to it instead of doing a massive amount of concats with temporary lists (some of which were lazy). The results are very promising. On V8, serialization of big data structures is 2-3 times faster, and in the IE8 JS engine, it's 10 times faster.

I still need to clean it up and get the tests working, but I should be able to submit a patch pretty soon.

Show
Evan Mezeske added a comment - I've rewritten the pr-str stuff, passing a StringBuffer down and appending to it instead of doing a massive amount of concats with temporary lists (some of which were lazy). The results are very promising. On V8, serialization of big data structures is 2-3 times faster, and in the IE8 JS engine, it's 10 times faster. I still need to clean it up and get the tests working, but I should be able to submit a patch pretty soon.
Hide
Evan Mezeske added a comment -

I've attached my patch to optimize pr-str. I used the StringBuffer approach, with which I see very substantial performance improvements of 2-3X in V8 and 10-100X in IE8.

I also added a suite of tests for pr-str, based loosely on the reader tests, to make sure that I didn't break anything. All tests pass.

Show
Evan Mezeske added a comment - I've attached my patch to optimize pr-str. I used the StringBuffer approach, with which I see very substantial performance improvements of 2-3X in V8 and 10-100X in IE8. I also added a suite of tests for pr-str, based loosely on the reader tests, to make sure that I didn't break anything. All tests pass.
Hide
David Nolen added a comment -

The approach looks good but this is a pretty big breaking change. After a year of ClojureScript I think there are quite a few code bases that rely on the contract of -pr-seq. I would prefer a patch that created a new protocol, IPrintMethod (or some better name) with support for this new interface for all current IPrintable types. Yes this means a lot of code duplication for now but this allows us to provide a fast path, it allows people to adopt it, and it gives us time to deprecate IPrintable.

Show
David Nolen added a comment - The approach looks good but this is a pretty big breaking change. After a year of ClojureScript I think there are quite a few code bases that rely on the contract of -pr-seq. I would prefer a patch that created a new protocol, IPrintMethod (or some better name) with support for this new interface for all current IPrintable types. Yes this means a lot of code duplication for now but this allows us to provide a fast path, it allows people to adopt it, and it gives us time to deprecate IPrintable.
Hide
Evan Mezeske added a comment -

I was afraid that you might say that! I was hoping that the leading dash on -pr-seq meant that nobody used it, but c'est la vie.

It shouldn't be hard to make things cohabitate in the way you described. Since I already renamed the -pr-seq function, I can just add the old stuff back in, and as long as the protocol names differ it should be easy.

I'll see if I can get to that this evening, although it might have to wait for the weekend. Either way, I'll get to it very soon.

Show
Evan Mezeske added a comment - I was afraid that you might say that! I was hoping that the leading dash on -pr-seq meant that nobody used it, but c'est la vie. It shouldn't be hard to make things cohabitate in the way you described. Since I already renamed the -pr-seq function, I can just add the old stuff back in, and as long as the protocol names differ it should be easy. I'll see if I can get to that this evening, although it might have to wait for the weekend. Either way, I'll get to it very soon.
Hide
David Nolen added a comment -

Yeah anybody working with deftype or defrecord will probably have provided their own IPrintable implementation. You should be able to dispatch to the fast path with a satisfies? test.

Thanks!

Show
David Nolen added a comment - Yeah anybody working with deftype or defrecord will probably have provided their own IPrintable implementation. You should be able to dispatch to the fast path with a satisfies? test. Thanks!
Hide
Evan Mezeske added a comment -

I updated my patch with support for backwards compatibility with old types that implement -pr-seq. If an object does not satisfy the new protocol (IPrintWith), pr-str will fall back on IPrintable recursively. I marked various IPrintable-related features as ^:deprecated, and included comments that explain why.

I also added some new tests, for backwards compatibility and printing records.

Show
Evan Mezeske added a comment - I updated my patch with support for backwards compatibility with old types that implement -pr-seq. If an object does not satisfy the new protocol (IPrintWith), pr-str will fall back on IPrintable recursively. I marked various IPrintable-related features as ^:deprecated, and included comments that explain why. I also added some new tests, for backwards compatibility and printing records.
Hide
Evan Mezeske added a comment -

For reference, I added a simple pr-str benchmark to cljs-bench. The test data structure that is printed is not very large, but I'm thinking that it's nested deeply enough to show the performance gains of my patch once it's applied. If not, I can update the benchmark to dynamically build up a larger data structure to really tax pr-str.

http://www.50ply.com/cljs-bench/#plot49

Show
Evan Mezeske added a comment - For reference, I added a simple pr-str benchmark to cljs-bench. The test data structure that is printed is not very large, but I'm thinking that it's nested deeply enough to show the performance gains of my patch once it's applied. If not, I can update the benchmark to dynamically build up a larger data structure to really tax pr-str. http://www.50ply.com/cljs-bench/#plot49
Hide
David Nolen added a comment -

Could we get a compiler warning for IPrintable usage? We can then remove it from the next release.

Show
David Nolen added a comment - Could we get a compiler warning for IPrintable usage? We can then remove it from the next release.
Hide
Evan Mezeske added a comment -

I will add the warning, hopefully this weekend sometime. I need to think about how to do it, though. The easy way would be to just generate a warning whenever pr-seq is called. However, for a deeply nested structure, that would produce a whole lot of warnings for just one use of pr-seq. I'll try to come up with a solution that will only generate one warning per invocation of pr(-str).

Show
Evan Mezeske added a comment - I will add the warning, hopefully this weekend sometime. I need to think about how to do it, though. The easy way would be to just generate a warning whenever pr-seq is called. However, for a deeply nested structure, that would produce a whole lot of warnings for just one use of pr-seq. I'll try to come up with a solution that will only generate one warning per invocation of pr(-str).
Hide
David Nolen added a comment -

Would be better to warn only implementation rather than usage. Ideally the patch just warns on files that are not core.cljs. That's still bound to create some noise internal to CLJS, but I think that'll encourage folks to fix their own code.

Show
David Nolen added a comment - Would be better to warn only implementation rather than usage. Ideally the patch just warns on files that are not core.cljs. That's still bound to create some noise internal to CLJS, but I think that'll encourage folks to fix their own code.
Hide
Evan Mezeske added a comment -

Ah, yeah, that would be much better. I forgot we're working on the compiler here, that sort of thing should be easy!

Show
Evan Mezeske added a comment - Ah, yeah, that would be much better. I forgot we're working on the compiler here, that sort of thing should be easy!
Hide
David Nolen added a comment - - edited

I wonder if an IWriter protocol isn't warranted here instead of making printer be a function. While it seems like overkill for clientside JS perhaps for serverside scripts it would be useful. Granted those users could easily construct a function but there something protocol like here that might benefit from being made explicit.

Perhaps worth pondering http://docs.oracle.com/javase/6/docs/api/java/io/Writer.html and http://nodejs.org/api/fs.html just a little bit.

Show
David Nolen added a comment - - edited I wonder if an IWriter protocol isn't warranted here instead of making printer be a function. While it seems like overkill for clientside JS perhaps for serverside scripts it would be useful. Granted those users could easily construct a function but there something protocol like here that might benefit from being made explicit. Perhaps worth pondering http://docs.oracle.com/javase/6/docs/api/java/io/Writer.html and http://nodejs.org/api/fs.html just a little bit.
Hide
Evan Mezeske added a comment -

I added an IWriter protocol, changed IPrintWith's name to IPrintWriter, and changed its implementation to use an IWriter. I added two types that implement IWriter: StringBufferWriter and StandardOutputWriter, and then implemented the pr-* functions in terms of those.

Show
Evan Mezeske added a comment - I added an IWriter protocol, changed IPrintWith's name to IPrintWriter, and changed its implementation to use an IWriter. I added two types that implement IWriter: StringBufferWriter and StandardOutputWriter, and then implemented the pr-* functions in terms of those.
Hide
Evan Mezeske added a comment -

I changed IPrintWriter to IPrintWithWriter, as per a conversation I had with David Nolen on IRC.

Show
Evan Mezeske added a comment - I changed IPrintWriter to IPrintWithWriter, as per a conversation I had with David Nolen on IRC.
Hide
David Nolen added a comment -

This patch no longer applies to master. Can we get a new patch? Thanks!

Show
David Nolen added a comment - This patch no longer applies to master. Can we get a new patch? Thanks!
Hide
Evan Mezeske added a comment -

Rebased the patch on top of master just now.

Show
Evan Mezeske added a comment - Rebased the patch on top of master just now.
Hide
David Nolen added a comment -

Great it applies! However tests now fail for SpiderMonkey and JavaScriptCore -

out/core-advanced-test.js:371: ReferenceError: write is not defined"

and

Exception: ReferenceError: Can't find variable: write
global code@out/core-advanced-test.js:371

Also can we please remove the old patches to prevent future confusion?

Thanks!

Show
David Nolen added a comment - Great it applies! However tests now fail for SpiderMonkey and JavaScriptCore -
out/core-advanced-test.js:371: ReferenceError: write is not defined"
and
Exception: ReferenceError: Can't find variable: write
global code@out/core-advanced-test.js:371
Also can we please remove the old patches to prevent future confusion? Thanks!
Hide
Evan Mezeske added a comment -

Sadly, JavaScriptCore does not have any way to print to standard output without a newline, so I had to remove the more efficient StandardOutputWriter and output strings by building them up fully in memory and then dumping them in one big print.

Eventually we could add the option to supply a write-fn, which is just like print-fn except guaranteed not to output a trailing newline. Then we could check whether a write-fn was set and if so avoid the intermediate in-memory copy of the string.

The tests are now passing under all three implementations.

Show
Evan Mezeske added a comment - Sadly, JavaScriptCore does not have any way to print to standard output without a newline, so I had to remove the more efficient StandardOutputWriter and output strings by building them up fully in memory and then dumping them in one big print. Eventually we could add the option to supply a write-fn, which is just like print-fn except guaranteed not to output a trailing newline. Then we could check whether a write-fn was set and if so avoid the intermediate in-memory copy of the string. The tests are now passing under all three implementations.
Hide
David Nolen added a comment -

I'm confused as to why you needed to remove StandardOutputWriter. Is there any reason to not make the implementation adaptable to the available capabilities? I think that most people that use ClojureScript serverside will be using V8 or Rhino anyhow.

Show
David Nolen added a comment - I'm confused as to why you needed to remove StandardOutputWriter. Is there any reason to not make the implementation adaptable to the available capabilities? I think that most people that use ClojureScript serverside will be using V8 or Rhino anyhow.
Hide
David Nolen added a comment -

I'm confused as to why you needed to remove StandardOutputWriter. Is there any reason to not make the implementation adaptable to the available capabilities? I think that most people that use ClojureScript serverside will be using V8 or Rhino anyhow.

Show
David Nolen added a comment - I'm confused as to why you needed to remove StandardOutputWriter. Is there any reason to not make the implementation adaptable to the available capabilities? I think that most people that use ClojureScript serverside will be using V8 or Rhino anyhow.
Hide
Evan Mezeske added a comment -

I removed StandardOutputWriter because it doesn't work with *print-fn*, which is assumed to append a newline to the text it outputs.

I agree that the implementation should be adaptable to the available capabilities, but I was thinking that such behavior could be added in a future patch.

The approach I had in mind was to add a settable *write-fn*, and have the implementation use a buffering approach if it *write-fn* is nil, otherwise use the more efficient StandardOutputWriter. If you agree with this approach, I'd be happy to add it to this patch.

Show
Evan Mezeske added a comment - I removed StandardOutputWriter because it doesn't work with *print-fn*, which is assumed to append a newline to the text it outputs. I agree that the implementation should be adaptable to the available capabilities, but I was thinking that such behavior could be added in a future patch. The approach I had in mind was to add a settable *write-fn*, and have the implementation use a buffering approach if it *write-fn* is nil, otherwise use the more efficient StandardOutputWriter. If you agree with this approach, I'd be happy to add it to this patch.
Hide
David Nolen added a comment - - edited

But why do people need to deal with *print-fn* or *write-fn* at all? Can't this logic be hidden inside StandardOutputWriter?

(set! *out* (new StandardOutputWriter.))

Granted code exists that depends on *print-fn* so I'm not talking about removing them. But being able to set! *out* also opens the door for rebinding *out* just like we do in Clojure.

Show
David Nolen added a comment - - edited But why do people need to deal with *print-fn* or *write-fn* at all? Can't this logic be hidden inside StandardOutputWriter?
(set! *out* (new StandardOutputWriter.))
Granted code exists that depends on *print-fn* so I'm not talking about removing them. But being able to set! *out* also opens the door for rebinding *out* just like we do in Clojure.
Hide
Evan Mezeske added a comment -

Is there an official list of what JS platforms are supported by ClojureScript? It appears that every platform has a unique API for printing to standard output, so StandardOutputWriter would have to inspect the environment to find write() (V8), putstr() (SpiderMonkey), no support (JavaScriptCore, Rhino), process.stdout.write() (node.js), or god knows what other functions in the more obscure JS engines.

I assume that *print-fn* exists to allow for compatibility with any JS platform (even those not explicitly tested against). If StandardOutputWriter is going to have platform-specific knowledge baked into it (e.g. a list of functions to try), I think it should also have a back door to allow people to override things for other JS platforms. I guess *out* could be used for that purpose, though...

Show
Evan Mezeske added a comment - Is there an official list of what JS platforms are supported by ClojureScript? It appears that every platform has a unique API for printing to standard output, so StandardOutputWriter would have to inspect the environment to find write() (V8), putstr() (SpiderMonkey), no support (JavaScriptCore, Rhino), process.stdout.write() (node.js), or god knows what other functions in the more obscure JS engines. I assume that *print-fn* exists to allow for compatibility with any JS platform (even those not explicitly tested against). If StandardOutputWriter is going to have platform-specific knowledge baked into it (e.g. a list of functions to try), I think it should also have a back door to allow people to override things for other JS platforms. I guess *out* could be used for that purpose, though...
Hide
David Nolen added a comment - - edited

I agree that this is a bit of scope creep for this ticket, we should hash it out a bit on design page first. Do we need *write-fn* for this patch to work? It would be preferable to do w/o it so we can design something proper w/o yet something else people will inadvertently depend on.

Show
David Nolen added a comment - - edited I agree that this is a bit of scope creep for this ticket, we should hash it out a bit on design page first. Do we need *write-fn* for this patch to work? It would be preferable to do w/o it so we can design something proper w/o yet something else people will inadvertently depend on.
Hide
Evan Mezeske added a comment -

The patch, in its current form (e.g. without a StandardOutputWriter), assumes that only *print-fn* is available, and doesn't require a *write-fn*.

Even without the StandardOutputWriter, it's generally much faster (and certainly no worse) than the existing implementation.

Show
Evan Mezeske added a comment - The patch, in its current form (e.g. without a StandardOutputWriter), assumes that only *print-fn* is available, and doesn't require a *write-fn*. Even without the StandardOutputWriter, it's generally much faster (and certainly no worse) than the existing implementation.
Hide
David Nolen added a comment -

Great! Will go with this then.

Show
David Nolen added a comment - Great! Will go with this then.
Hide
David Nolen added a comment -

Can we rebase the patch on master again, this won't apply cleanly because of another commit thank snuck in involving pr-str. Thanks!

Show
David Nolen added a comment - Can we rebase the patch on master again, this won't apply cleanly because of another commit thank snuck in involving pr-str. Thanks!
Hide
Evan Mezeske added a comment -

Huh, that patch was based on the latest master commit – I'm very confused about why it won't apply cleanly. I must have done something weird when I created it. I tested the application of the new patch I'm attaching, so it should be good to go.

Show
Evan Mezeske added a comment - Huh, that patch was based on the latest master commit – I'm very confused about why it won't apply cleanly. I must have done something weird when I created it. I tested the application of the new patch I'm attaching, so it should be good to go.
Hide
Brandon Bloom added a comment -

Sweet! Good stuff, Evan, thanks.

You wouldn't be interested in writing a new patch for CLJS-321 would you?

Show
Brandon Bloom added a comment - Sweet! Good stuff, Evan, thanks. You wouldn't be interested in writing a new patch for CLJS-321 would you?
Hide
David Nolen added a comment -

This ticket is great but it did affect printing behavior - it would be nice to resolve http://dev.clojure.org/jira/browse/CLJS-372 before cutting a new release.

Show
David Nolen added a comment - This ticket is great but it did affect printing behavior - it would be nice to resolve http://dev.clojure.org/jira/browse/CLJS-372 before cutting a new release.
Hide
Evan Mezeske added a comment -

Could you add some more detail to CLJS-372? I'll be happy to fix any regressions caused by this patch. In my testing, I didn't notice anything broken, though, so I'll need a bit more detail so that I know where to start looking.

Show
Evan Mezeske added a comment - Could you add some more detail to CLJS-372? I'll be happy to fix any regressions caused by this patch. In my testing, I didn't notice anything broken, though, so I'll need a bit more detail so that I know where to start looking.
Hide
David Nolen added a comment -

I added some more detail to CLJS-372.

Show
David Nolen added a comment - I added some more detail to CLJS-372.

People

Vote (2)
Watch (5)

Dates

  • Created:
    Updated:
    Resolved: