core.memoize

Thread synchronized and unsynchronized spoofing

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None

Description

Problem: Function A is memoized. Function B, among other things, calculates a valid function call of A without ever invoking A. We want to utilize this effect in B to further cut calculations of A. In case that we expect B to perform faster than A we also want callers of A to wait when a corresponding B is running.

A desired operation how to do this in B would be
(spoof A args calc)

Where calc is a fn of zero args that produces a value equal to (apply A args). This allows the caller to put exactly the amount of work into calc that he expects to perform faster than A.

Spoof returns what calc returns or in case of a cache hit what is cached for args. I. e. if an invocation (A args) happened before the call of (spoof A args calc), calc won't be invoked.

In cases where we already have the result of (A args) and don't want to be blocked by a concurrent invocation of (A args) a desired operation would be

(spoof-unsynchronized A args val)

it returns val immediately and updates the cache of A soon. It doesn't have to block invocations of (A args) since the unsynchronized spoof is a very fast operation that is subject to a race condition with invocations of (A args) anyway. (So a swap! on the internal cache state should suffice).

Activity

Hide
Sean Corfield added a comment -

This doesn't feel like something that should be part of the core.memoize library but it should be possible to build something like this on top of it (and it isn't right now, correct?).

Would the minimum change necessary to allow this be to expose the underlying cache (so you could invoke deref/swap! against the cache directly)?

Show
Sean Corfield added a comment - This doesn't feel like something that should be part of the core.memoize library but it should be possible to build something like this on top of it (and it isn't right now, correct?). Would the minimum change necessary to allow this be to expose the underlying cache (so you could invoke deref/swap! against the cache directly)?
Hide
Leon Grapenthin added a comment - - edited

For the unsynchronized case yes, For the synchronized (interesting) case maybe with nasty hacking, I'd have to look at the code.

I disagree that this functionality should not be built in. Its quite general and allows very flexible, user definied memoization implementations.

The design recommended above is stolen from Google Guavas cache.

https://google.github.io/guava/releases/23.0/api/docs/com/google/common/cache/Cache.html#get-K-java.util.concurrent.Callable-

It allows memoization like this. You just call
(.get c [arg1] (fn [] "calc for [arg1]")) ;; notice how easy it is to implement a custom memoizer with just that facility!

The necessary logic that runs fn synchronized (once) is in our case implemented in core.memoize. So if core.memoize would expose a similar facility I believe that would be a very generic API that allows much more stuff to be built on top (custom memoizer impls).

It also has .put which handles the unsynchronized case.

Show
Leon Grapenthin added a comment - - edited For the unsynchronized case yes, For the synchronized (interesting) case maybe with nasty hacking, I'd have to look at the code. I disagree that this functionality should not be built in. Its quite general and allows very flexible, user definied memoization implementations. The design recommended above is stolen from Google Guavas cache. https://google.github.io/guava/releases/23.0/api/docs/com/google/common/cache/Cache.html#get-K-java.util.concurrent.Callable- It allows memoization like this. You just call (.get c [arg1] (fn [] "calc for [arg1]")) ;; notice how easy it is to implement a custom memoizer with just that facility! The necessary logic that runs fn synchronized (once) is in our case implemented in core.memoize. So if core.memoize would expose a similar facility I believe that would be a very generic API that allows much more stuff to be built on top (custom memoizer impls). It also has .put which handles the unsynchronized case.
Hide
Sean Corfield added a comment -

Interesting. My next question was going to be "where did you get the idea for this?".

The underlying machinery in core.memoize works in a similar way, with a through* function that handles the thunk etc, but it's the build-memoizer function that encapsulates all of that, along with the PluggableMemoizer type.

I'll have to give this some thought. The cache itself is attached to the metadata for the function object returned from build-memoizer so, technically, it's all already exposed if you wanted to program this yourself.

Show
Sean Corfield added a comment - Interesting. My next question was going to be "where did you get the idea for this?". The underlying machinery in core.memoize works in a similar way, with a through* function that handles the thunk etc, but it's the build-memoizer function that encapsulates all of that, along with the PluggableMemoizer type. I'll have to give this some thought. The cache itself is attached to the metadata for the function object returned from build-memoizer so, technically, it's all already exposed if you wanted to program this yourself.
Hide
Leon Grapenthin added a comment -

My understanding is that I would need to invoke private through* with a patched (args ignoring) fn.

Can't really call that exposed

Show
Leon Grapenthin added a comment - My understanding is that I would need to invoke private through* with a patched (args ignoring) fn. Can't really call that exposed
Hide
Leon Grapenthin added a comment -

And I'd also need to copypaste the code following line 209 handling a rare timing issue.

Show
Leon Grapenthin added a comment - And I'd also need to copypaste the code following line 209 handling a rare timing issue.
Hide
Leon Grapenthin added a comment -

Or did I misunderstand this and you suggested if I wanted to program a patch?

Show
Leon Grapenthin added a comment - Or did I misunderstand this and you suggested if I wanted to program a patch?
Hide
Sean Corfield added a comment -

I meant: the raw cache is right there in the metadata if you wanted to check if an argument list key existed and/or update the cache with a new value via the thunk you talk about above. You can invoke core.cache/through (or core.cache/through-cache more likely) directly on that cache, with a thunk (albeit of one argument, which you could just ignore):

(swap! (:clojure.core.memoize/cache (meta the-fn-var)) clojure.core.cache/through-cache [arg1] (fn [_] "calc for [arg1]"))

Any user of TTL cache has to potentially deal with the edge case of eviction during access (unless they don't care about getting nil back) so, yes, a spoof function that does the above swap! would potentially have to retry the thunk invocation – whether you also updated the cache at that point would be up to you (i.e., what behavior do you want for subsequent calls if a cache eviction occurred during your spoof call?).

The point is that spoof requires the actual cache to do its job. The memoized function only has access to that cache by virtue of being a closure, so spoof has to operate on the Var holding the memoized function.

Does that clarify what I'm suggesting?

Show
Sean Corfield added a comment - I meant: the raw cache is right there in the metadata if you wanted to check if an argument list key existed and/or update the cache with a new value via the thunk you talk about above. You can invoke core.cache/through (or core.cache/through-cache more likely) directly on that cache, with a thunk (albeit of one argument, which you could just ignore):
(swap! (:clojure.core.memoize/cache (meta the-fn-var)) clojure.core.cache/through-cache [arg1] (fn [_] "calc for [arg1]"))
Any user of TTL cache has to potentially deal with the edge case of eviction during access (unless they don't care about getting nil back) so, yes, a spoof function that does the above swap! would potentially have to retry the thunk invocation – whether you also updated the cache at that point would be up to you (i.e., what behavior do you want for subsequent calls if a cache eviction occurred during your spoof call?). The point is that spoof requires the actual cache to do its job. The memoized function only has access to that cache by virtue of being a closure, so spoof has to operate on the Var holding the memoized function. Does that clarify what I'm suggesting?
Hide
Leon Grapenthin added a comment -

Spoof would also utilize to use internal through* for its locking facilities in d-lay. It might require minor adjustments to support a thunk.

The design recommended above intentionally takes a memoized function so that it can extract ::cache.

Regarding the TTL fix I have to look at it again. Please let me if you are interested in a patch with spoof/spoof-aysnc...

Show
Leon Grapenthin added a comment - Spoof would also utilize to use internal through* for its locking facilities in d-lay. It might require minor adjustments to support a thunk. The design recommended above intentionally takes a memoized function so that it can extract ::cache. Regarding the TTL fix I have to look at it again. Please let me if you are interested in a patch with spoof/spoof-aysnc...
Hide
Sean Corfield added a comment -

I hate the names but I'd consider a patch that exposed enough of the machinery to allow users to build their own versions outside the library more easily.

Show
Sean Corfield added a comment - I hate the names but I'd consider a patch that exposed enough of the machinery to allow users to build their own versions outside the library more easily.
Hide
Leon Grapenthin added a comment -

Did choose the names to fit the current design of the lib. Where Guava gives you powerful primitives and claims generic low level names, this lib is written in the style where it makes strong assumptions about what the user wants to do with it. It makes you have a memoized function with a cache attached right away.

This is why I have chosen names that only describe why you would use them /additionally/. There might be a better word than "spoof" - I'm not a native speaker.

core.cache only exposes the datastructures of caches, but it doesn't expose a proper reference type to cache calculations (instead of values), like Guava does. I suspect that the core.cache authors missed to do this because they thought that Clojures reference types would suffice. core.memoize implements such a reference type for itself as an internal. I'm unsure whether exposing this functionality, which has a good amount of usecases outside of memoization would be at the right place in this lib. Hence the naming and wrapping around memoized functions to clearly suggest that the functionality is supposed in the scope of memoization.

Show
Leon Grapenthin added a comment - Did choose the names to fit the current design of the lib. Where Guava gives you powerful primitives and claims generic low level names, this lib is written in the style where it makes strong assumptions about what the user wants to do with it. It makes you have a memoized function with a cache attached right away. This is why I have chosen names that only describe why you would use them /additionally/. There might be a better word than "spoof" - I'm not a native speaker. core.cache only exposes the datastructures of caches, but it doesn't expose a proper reference type to cache calculations (instead of values), like Guava does. I suspect that the core.cache authors missed to do this because they thought that Clojures reference types would suffice. core.memoize implements such a reference type for itself as an internal. I'm unsure whether exposing this functionality, which has a good amount of usecases outside of memoization would be at the right place in this lib. Hence the naming and wrapping around memoized functions to clearly suggest that the functionality is supposed in the scope of memoization.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated: