Clojure

Improve clojure.core/distinct perf by using transient set

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Patch:
    Code
  • Approval:
    Triaged

Description

Current implementation of clojure.core/distinct uses persistent set. This patch improves performance of lazy arity by ~25%-30% and transducer by ~40%-50% by using transient set instead.

10 elements
(doall (distinct coll)) 	 5.773439 µs => 4.179092 µs (-27%)
(into [] (distinct) coll) 	 3.238236 µs => 1.943254 µs (-39%)

100 elements
(doall (distinct coll)) 	 67.725764 µs => 42.129993 µs (-37%)
(into [] (distinct) coll) 	 35.702741 µs => 16.495947 µs (-53%)

1000 elements
(doall (distinct coll)) 	 540.652739 µs => 399.053873 µs (-26%)
(into [] (distinct) coll) 	 301.423077 µs => 164.025500 µs (-45%)

10000 elements
(doall (distinct coll)) 	 3.439137 ms => 3.058872 ms (-11%)
(into [] (distinct) coll) 	 1.437390 ms => 848.277178 µs (-40%)

Benchmarking code: https://gist.github.com/tonsky/97dfe1f9c48eccafc983a49c7042fb21

Activity

Hide
Alex Miller added a comment -

You can't remove the volatile - you still need that for safe publication in multi threaded transducing contexts.

Show
Alex Miller added a comment - You can't remove the volatile - you still need that for safe publication in multi threaded transducing contexts.
Hide
Nikita Prokopov added a comment -

Alex Miller How do you mean?

  • I don’t update seen link because transient set can be mutated in-place
  • Are transducers meant to be used from multiple threads? Because even existing implementation clearly has race condition. I imagine fixing that would be costly (we’ll need a synchronized section), so maybe it should be a specialized transducer that you use only when needed?
Show
Nikita Prokopov added a comment - Alex Miller How do you mean?
  • I don’t update seen link because transient set can be mutated in-place
  • Are transducers meant to be used from multiple threads? Because even existing implementation clearly has race condition. I imagine fixing that would be costly (we’ll need a synchronized section), so maybe it should be a specialized transducer that you use only when needed?
Hide
Alex Miller added a comment - - edited

Transient sets can NOT be mutated in place - you must use the return value.

Yes, transducers are used from multiple threads in (for example) transducer chans in core.async go blocks.

Show
Alex Miller added a comment - - edited Transient sets can NOT be mutated in place - you must use the return value. Yes, transducers are used from multiple threads in (for example) transducer chans in core.async go blocks.
Hide
Alex Miller added a comment -

I should also say transducers are not expected to be used from more than one thread at a time, so there are no race problems. But being used from multiple threads over time requires proper safe publication.

Show
Alex Miller added a comment - I should also say transducers are not expected to be used from more than one thread at a time, so there are no race problems. But being used from multiple threads over time requires proper safe publication.
Hide
Nikita Prokopov added a comment -

But being used from multiple threads over time requires proper safe publication.

Does that imply that no transients could be used in transducers (because underlying arrays on which transient impl is based are mutated in place, so different threads could potentially see different states of transient object)?

Does that also mean that partition-by and partition-all should be fixed (they use java.util.ArrayList which, being array of references, has no safe publication semantics)?

Transient sets can NOT be mutated in place - you must use the return value.

I was thinking that clojure/core.clj and clojure.lang.ATransientSet.java are both part of Clojure internals, colocated, so can share a little bit of internal knowledge about each other. It seems safe to do that, because that knowledge does not leak outside, and, if at any point impl of ATransientSet would change, core.clj could be updated accordingly in the same release. I wouldn’t do that in any third-party library, of course.

Show
Nikita Prokopov added a comment -
But being used from multiple threads over time requires proper safe publication.
Does that imply that no transients could be used in transducers (because underlying arrays on which transient impl is based are mutated in place, so different threads could potentially see different states of transient object)? Does that also mean that partition-by and partition-all should be fixed (they use java.util.ArrayList which, being array of references, has no safe publication semantics)?
Transient sets can NOT be mutated in place - you must use the return value.
I was thinking that clojure/core.clj and clojure.lang.ATransientSet.java are both part of Clojure internals, colocated, so can share a little bit of internal knowledge about each other. It seems safe to do that, because that knowledge does not leak outside, and, if at any point impl of ATransientSet would change, core.clj could be updated accordingly in the same release. I wouldn’t do that in any third-party library, of course.
Hide
Alex Miller added a comment -

Does that imply that no transients could be used in transducers (because underlying arrays on which transient impl is based are mutated in place, so different threads could potentially see different states of transient object)?

Transients require only that they are asked by no more than a single thread at a time and so are safe to use in a transducer. However, they should guarantee safe publication. core.async channels already do this as an artifact of their implementation, but other transducing contexts may not.

Transients should NEVER be used as "mutate in place", regardless of concurrency. While they will appear to "work" in some circumstances, this is never correct (eventually an update operation will return a new instance and if you are mutating in place, your data will then be missing). This is discussed and correct examples are shown at http://clojure.org/reference/transients.

Does that also mean that partition-by and partition-all should be fixed (they use java.util.ArrayList which, being array of references, has no safe publication semantics)?

That's something Rich and I are discussing but, probably.

Show
Alex Miller added a comment -
Does that imply that no transients could be used in transducers (because underlying arrays on which transient impl is based are mutated in place, so different threads could potentially see different states of transient object)?
Transients require only that they are asked by no more than a single thread at a time and so are safe to use in a transducer. However, they should guarantee safe publication. core.async channels already do this as an artifact of their implementation, but other transducing contexts may not. Transients should NEVER be used as "mutate in place", regardless of concurrency. While they will appear to "work" in some circumstances, this is never correct (eventually an update operation will return a new instance and if you are mutating in place, your data will then be missing). This is discussed and correct examples are shown at http://clojure.org/reference/transients.
Does that also mean that partition-by and partition-all should be fixed (they use java.util.ArrayList which, being array of references, has no safe publication semantics)?
That's something Rich and I are discussing but, probably.
Hide
Nikita Prokopov added a comment -

Alex Miller Here’s quick test that shows that changes to transient set (which is nothing more but a wrapper around transient map) made in one thread are not always visible from another thread.

https://gist.github.com/tonsky/62a7ec6d539fc013186bee2df0812cf6

That means that if we try to use transients for e.g. distinct it will miss duplicate items

Show
Nikita Prokopov added a comment - Alex Miller Here’s quick test that shows that changes to transient set (which is nothing more but a wrapper around transient map) made in one thread are not always visible from another thread. https://gist.github.com/tonsky/62a7ec6d539fc013186bee2df0812cf6 That means that if we try to use transients for e.g. distinct it will miss duplicate items
Hide
Nikita Prokopov added a comment -

Removed transients from transducer arity of distincts because transducers might be accessed from multiple threads

Show
Nikita Prokopov added a comment - Removed transients from transducer arity of distincts because transducers might be accessed from multiple threads
Hide
Nikita Prokopov added a comment -

Maybe that doc http://clojure.org/reference/transients should be updated re: transients are not safe to use from multiple threads because changes made by one thread are not necessarily visible to another. Even if they don’t compete

Show
Nikita Prokopov added a comment - Maybe that doc http://clojure.org/reference/transients should be updated re: transients are not safe to use from multiple threads because changes made by one thread are not necessarily visible to another. Even if they don’t compete
Hide
Alex Miller added a comment -

I would say that test is demonstrating a bug in transient sets/maps and you should file a ticket for that as it's a lot more important than this enhancement.

distinct should be able to use transients in both the transducer and lazy seq impls. The issue with contains? not working on transients is actually a separate ticket - http://dev.clojure.org/jira/browse/CLJ-700 that will likely require some class hierarchy rearrangement. I don't think we would take this change until that is fixed (so that you can avoid relying on the class and Java method variants).

Show
Alex Miller added a comment - I would say that test is demonstrating a bug in transient sets/maps and you should file a ticket for that as it's a lot more important than this enhancement. distinct should be able to use transients in both the transducer and lazy seq impls. The issue with contains? not working on transients is actually a separate ticket - http://dev.clojure.org/jira/browse/CLJ-700 that will likely require some class hierarchy rearrangement. I don't think we would take this change until that is fixed (so that you can avoid relying on the class and Java method variants).
Hide
Nikita Prokopov added a comment -

I have to admit my test was demonstrating something else: there were no proper thread isolation. So it was a concurrency issue, not “safe publication” issue. My current understanding is this:

Transients require thread isolation. Use of a particular transient instance should be controlled either by using it in an single-threaded scope, or in a framework that enforces this.

That guarantee implicitly presumes that there’s happens-before relation between transient usage from multiple threads. There’s no other way to define “only one thread is in this section at a time”.

That, in turn, means that all writes that happened in thread 1 are visible in thread 2, regardless to volatility of the variables involved. In fact, we can remove all volatiles from transients implementation and probably make them faster, because, by asking “no more than one thread at a time” we enforce users to establish happens-before between sections, and that would give us all the safe publication guarantees we need.

Is my understanding correct? Am I missing something?

Show
Nikita Prokopov added a comment - I have to admit my test was demonstrating something else: there were no proper thread isolation. So it was a concurrency issue, not “safe publication” issue. My current understanding is this:
Transients require thread isolation. Use of a particular transient instance should be controlled either by using it in an single-threaded scope, or in a framework that enforces this.
That guarantee implicitly presumes that there’s happens-before relation between transient usage from multiple threads. There’s no other way to define “only one thread is in this section at a time”. That, in turn, means that all writes that happened in thread 1 are visible in thread 2, regardless to volatility of the variables involved. In fact, we can remove all volatiles from transients implementation and probably make them faster, because, by asking “no more than one thread at a time” we enforce users to establish happens-before between sections, and that would give us all the safe publication guarantees we need. Is my understanding correct? Am I missing something?
Hide
Nikita Prokopov added a comment -

Also, long-living transients (e.g. in a transducers associated with a queue, for example) will hold a reference to a thread that created them. Is that a bad thing? Should we switch to boolean flag instead?

Show
Nikita Prokopov added a comment - Also, long-living transients (e.g. in a transducers associated with a queue, for example) will hold a reference to a thread that created them. Is that a bad thing? Should we switch to boolean flag instead?

People

Vote (0)
Watch (3)

Dates

  • Created:
    Updated: